Understanding Organizing and Presenting Data
Understanding Organizing and Presenting Data
Organizing
and
Presenting Data
Dr. Ramesh Kandela
[email protected]
Understanding Data
Data and data set:
• Data are facts and figures collected, analysed and summarized for presentation and
interpretation.
• Facts are the truths which could be numeric or non-numeric in nature and figures are
information which are numeric.
• In a more technical sense, data are a set of values of qualitative/categorical or quantitative
nature pertaining to one or more individuals or objects.
• All the data collected for a particular study are referred to as data set for the study. A data set
is a collection of observations on one or more variables.
Types of Data
Interval Ratio
Basic Terms in Understanding Data
• Element or Member: An element or member of a sample or population is a specific subject
or object (for example, a person, firm, item, state, or country) about which the information is
collected.
• Variable: A variable is a characteristic of the elements under study that assumes different
values for different elements.
• Observation or Measurement: The value of a variable for an element is called an observation
or measurement.
Sample Data of IFHE Students
• Population
Course No.of Students Variable
• Sample BBA 16
BSc 3
BTech 10 An observation or
An element or a member
BCOM 4 measurement
BA(Economics) 2
SUM 35
Types of Data
Qualitative Data :
• A variable that cannot assume a numerical value but can be classified into two or more
nonnumeric categories is called a qualitative or categorical variable. The data collected on
such a variable are called qualitative data.
• However, even when the categorical data are identified by a numerical code.
• qualitative variables are the gender of a person, the brand of a mobile.
Quantitative Data
• A variable that can be measured numerically is called a quantitative variable. The data
collected on a quantitative variable are called quantitative data.
• quantitative variables may be classified as either discrete variables or continuous variables.
• Discrete: Counted
Discrete Data can only take certain values.
Example: the number of students in a class
• Continuous: Measured
Continuous Data can take any value (within a range)
Example: A person's height
Cross-section & Time-series data
Based on the time over which they are collected, data can be classified as either cross-section
or time-series data.
IPL 2023 Runs by Players
• Cross-section data contain information on different elements of a Player Runs
population or sample for the same period of time. Shubman Gill 890
Faf Du Plessis 730
Devon Conway 672
Virat Kohli 639
Yashasvi Jaiswal 625
Suryakumar Yadav 605
Ratio scale
• The scale of measurement for a variable is a ratio scale if the data have all the properties
of interval data and the ratio of two values is meaningful.
• Variables such as distance, height, weight, and time use the ratio scale of measurement.
• It has an absolute zero point. (‘zero’ has a meaning )
• It is meaningful to compute ratios of scale values.
Scale of Measurement
Scale
Nominal Numbers
Assigned 7 8 3
to Runners
Finish, in
Seconds
Organizing and Presenting
Data to Convey Meaning:
Tables and Graphs
Sources of Data
Primary Data
• Primary data is a type of data that is collected by researchers directly from main(first-hand)
sources through interviews, surveys, and experiments or observations.
• Customer satisfaction surveys, interviews of scientists, health observation of patients, etc. are
some examples of primary data.
Secondary Data
• Secondary data is the data that have been already collected and is readily available from
other sources.
• Secondary data are data that have been collected for another purpose and where we will use
statistical methods with the Primary Data. It means that after performing statistical
operations on Primary data the results become known as Secondary Data.
• Secondary data are second-hand information that already exists in published or unpublished
forms.
• These information can be obtained from journals, magazines, reports, websites, etc.
• Financial reports, population census, CMIE reports, ProwessIQ database, IMF reports, etc.
Below Data collected from the Students(35) Average Time Spent(No. of Hours) on Social Media
per Month
70 66 60 55 61 63 72
68 60 60 63 60 75 68
59 71 53 75 64 64 52
64 64 68 64 66 67 63
64 70 69 68 63 59 57
When data are collected in original form, they are called raw data. Raw data is not very
meaningful to an audience.
• An array is the arrangement of the values in ascending or descending order.
52 53 55 57 59 59 60
60 60 60 61 63 63 63
63 64 64 64 64 64 64
66 66 67 68 68 68 68
69 70 70 71 72 75 75
Descriptive Statistics
Descriptive statistics consists of methods for organizing, displaying, and describing data by
using tables, graphs, and summary measures.
Types of descriptive statistics:
• Organize the Data
• Tables
• Frequency Distributions
• Relative Frequency Distributions
• Displaying the Data
• Graphs
• Bar Chart or Histogram
• Summarize the Data
• Central Tendency
• Variation
Frequency Distribution
Organize Data:
• Organize Data using frequency distributions(Tables)
• A frequency distribution is the organizing the raw data in table form, using classes (groups)
and frequencies
• Class (group) is a quantitative or qualitative category
• Frequency, f, of a class is the number of data values contained in a specific class.
Below Data collected from the Students(35) Average Time Spent(No. of Hours) on Social Media
per Month from different courses.
Gender
No. of Female No. of Male
Course Students Students Total
BBA 8 8 16
BSc 1 2 3
BTech 6 4 10
BCOM 2 2 4
BA(Economics) 1 1 2
SUM 18 17 35
Frequency Distribution for Quantitative Data
Below Data collected from the Students’ (35) Average Time Spent(No. of Hours) on Social
Media per Month
70 66 60 55 61 63 72
68 60 60 63 60 75 68
59 71 53 75 64 64 52
64 64 68 64 66 67 63
64 70 69 68 63 59 57
When data are collected in original form, they are called raw data. Raw data is not very
meaningful to an audience.
• An array is the arrangement of the values in ascending or descending order.
52 53 55 57 59 59 60
60 60 60 61 63 63 63
63 64 64 64 64 64 64
66 66 67 68 68 68 68
69 70 70 71 72 75 75
Class(X) Frequency
Discrete frequency distribution
52 1
• When a frequency distribution table lists all of the 53 1
individual categories (X values) it is called a Discrete 55 1
frequency distribution 57 1
59 2
• It can be used when the range of values in the data set 60 4
is not large. 61 1
63 4
64 6
66 2
67 1
68 4
69 1
70 2
71 1
72 1
75 2
Total 35
Continuous Frequency Distribution
• Sometimes, however, a set of observations covers a wide range of values. In these situations, a list
of all the X values would be quite long - too long to be a “simple” presentation of the data. To
remedy this situation, a Continuous frequency distribution table is used.
• Continuous frequency distributions: The data must be grouped into classes that are more than one
unit in width. In a grouped table, the X column lists groups of observations, called class intervals,
rather than individual values. X Frequency
• Constructing a Frequency Distribution 52-55 3
• Step 1: Find the highest and lowest value. (75 and 52) 56-59 3
• Step 2: Find the range(difference between highest and 60-63 9
lowest value 75-52=23) 64-67 9
68-71 8
• Step 3: Select the number of classes desired. (6 )
72-75 3
• Step 4: Find the class width by dividing the range by the Total 35
number of classes. (23/6=3.83=4
Relative frequency distribution
• A relative frequency distribution presents frequencies in terms of
percentages.
• Relative frequency of a class
= Frequency of that class
Sum of all frequencies
• Percentage=(Relative frequency )*100
Class Frequency Relative Frequency Percent
52-55 3 0.09 8.57
56-59 3 0.09 8.57
60-63 9 0.26 25.71
64-67 9 0.26 25.71
68-71 8 0.23 22.86
72-75 3 0.09 8.57
Total 35 1 100
Cumulative Frequency Distribution
• The cumulative frequency is the total of frequencies, in which the frequency of the first class
interval is added to the frequency of the second class interval and then the sum is added to
the frequency of the third class interval and so on.
• Generally, the cumulative frequency distribution is used to identify the number of
observations that lie above or below the particular frequency in the provided data set.
Cumulative
Class Frequency Frequency
52-55 3 3
56-59 3 6
The Ogive is a graph of a cumulative frequency 60-63 9 15
64-67 9 24
distribution. cumulative frequency
40 34 35 68-71 8 32
32
35
30 24 72-75 3 35
25
20 15
15
10 6
3
5
0
Statistical Graphs
• A statistical graph or chart is defined as the pictorial representation of statistical data in
graphical form. Statistical graphs are used to represent a set of data to make it easier to
understand and interpret statistical information.
• Bar Graph /Column
• Pie Chart
• Line Chart
• Histogram
• Scatter Plot
• Box Plot
Two Key Questions
1. What type of data are you working with?
• Qualitative
• Quantitative
2. What are you trying to communicate?
• Relationship
• comparison
• distribution
• trending, etc.
• Bar Graph A graph made of bars whose heights represent the frequencies of respective
categories is called a bar graph.
• BAR & COLUMN CHARTS COMMONLY USED FOR:
• Comparing numerical data across categories
• EXAMPLES:
• Total sales by product type
• Population by country
• Revenue by department, by quarter
No. of Students
18
Course No. of Students 16
16
BBA 16 14
12
BSc 3 10
10
BTech 10 8
BCOM 4 6
3
4
4
BA(Economics) 2 2
2
SUM 35 0
BBA BSc BTech BCOM BA(Economics)
Pie chart
• Pie chart A circle divided into portions(slices) that represent the relative frequencies or
percentages of different categories or classes.
• Use pie charts to show proportions of a whole.
• The slice of a pie chart is to show the proportion of parts out of a whole.
29%
8%
• Examples:
• Frequency of test scores among students Histogarm
• Distribution of population by age group 10
9
9 9
8
8
• Distribution of heights or weights 7
6
5
4
3 3 3
3
2
1
0
52-55 56-59 60-63 64-67 68-71 72-75
Shapes of Histograms
• A histogram can assume any one of a large number of shapes. The most common of these
shapes are
1. Symmetric
2. Skewed
3. Uniform or rectangular
• A symmetric histogram is identical on both sides of its central point.
• A skewed histogram is nonsymmetric. For a skewed histogram, the tail on one side is longer
than the tail on the other side. A skewed-to-the-right histogram has a longer tail on the right
side (see Figure a). A skewed-to-the-left histogram has a longer tail on the left side (see
Figure b).
• A uniform or rectangular histogram has the same frequency for each class.
Scatter Plot
• A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two different
numeric variables. The position of each dot on the horizontal and vertical axis indicates
values for an individual data point.
• Scatter plots are used to observe relationships between variables.
Commonly Used For:
• Exploring Correlations Or Relationships Between Series
Examples:
• Advertisements And Sales
• Study time and Marks
• Positive correlation depicts a rise, and it is seen on the
diagram as data points slope upwards from the lower-left
corner of the chart towards the upper-right.
• Negative correlation depicts a fall, and this is seen on the
chart as data points slope downwards from the upper-left
corner of the chart towards the lower-right.
• Data that is neither positively nor negatively correlated is
considered uncorrelated (null).
Scatter Plot
Study time Marks
20 40
24 55 90
Scatter Plot
46 69 80
70
Marks
62 83 60
50
40
22 27 30
20
37 44 10
0
0 10 20 30 40 50 60 70
Study time
• A side-by-side bar chart is a graphical side-by-side bar
display for depicting multiple bar 9
8
8 8
No. of Female
charts on the same display 7
6
6
Students
5 4
4
3
2
2 2 2 No. of Male
1 1 1
1
0
Students
Stacked bar
The following data give the total number of iPods sold by a mail order on each of 30 days.
Construct a frequency distribution table. Draw a histogram.
Box and Whisker plot
• A box-and-whisker plot gives a graphic presentation of data using five measures: the median, the
first quartile, the third quartile, and the smallest and the largest values in the data set between the
lower and the upper inner fences.
• The length of the box is equivalent to IQR. It is possible that the data may contain values beyond
Q1 – 1.5 IQR and Q3 + 1.5 IQR. The whisker of the box plot extends till Q1 – 1.5 IQR (or minimum
value) and Q3 + 1.5 IQR (or maximum value); observations beyond these two limits are potential
outliers.
• Commonly Used For:
• Visualizing statistical characteristics across data series
• EXAMPLES:
• Comparing historical annual rainfall across cities
• Analyzing distributions of values and identifying outliers
• Comparing mean and median height/weight by country