0% found this document useful (0 votes)
12 views

Understanding Organizing and Presenting Data

Uploaded by

ibsindiahyd23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Understanding Organizing and Presenting Data

Uploaded by

ibsindiahyd23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Understanding

Organizing
and
Presenting Data
Dr. Ramesh Kandela
[email protected]
Understanding Data
Data and data set:
• Data are facts and figures collected, analysed and summarized for presentation and
interpretation.
• Facts are the truths which could be numeric or non-numeric in nature and figures are
information which are numeric.
• In a more technical sense, data are a set of values of qualitative/categorical or quantitative
nature pertaining to one or more individuals or objects.
• All the data collected for a particular study are referred to as data set for the study. A data set
is a collection of observations on one or more variables.
Types of Data

Interval Ratio
Basic Terms in Understanding Data
• Element or Member: An element or member of a sample or population is a specific subject
or object (for example, a person, firm, item, state, or country) about which the information is
collected.
• Variable: A variable is a characteristic of the elements under study that assumes different
values for different elements.
• Observation or Measurement: The value of a variable for an element is called an observation
or measurement.
Sample Data of IFHE Students
• Population
Course No.of Students Variable
• Sample BBA 16
BSc 3
BTech 10 An observation or
An element or a member
BCOM 4 measurement
BA(Economics) 2
SUM 35
Types of Data
Qualitative Data :
• A variable that cannot assume a numerical value but can be classified into two or more
nonnumeric categories is called a qualitative or categorical variable. The data collected on
such a variable are called qualitative data.
• However, even when the categorical data are identified by a numerical code.
• qualitative variables are the gender of a person, the brand of a mobile.
Quantitative Data
• A variable that can be measured numerically is called a quantitative variable. The data
collected on a quantitative variable are called quantitative data.
• quantitative variables may be classified as either discrete variables or continuous variables.
• Discrete: Counted
Discrete Data can only take certain values.
Example: the number of students in a class
• Continuous: Measured
Continuous Data can take any value (within a range)
Example: A person's height
Cross-section & Time-series data
Based on the time over which they are collected, data can be classified as either cross-section
or time-series data.
IPL 2023 Runs by Players
• Cross-section data contain information on different elements of a Player Runs
population or sample for the same period of time. Shubman Gill 890
Faf Du Plessis 730
Devon Conway 672
Virat Kohli 639
Yashasvi Jaiswal 625
Suryakumar Yadav 605

Time-series data contain information on the same element for


Year Player Runs
different periods of time
2020Virat Kohli 466
2021Virat Kohli 405
2022Virat Kohli 341
2023Virat Kohli 639
Scales of Measurement
• The scale of measurement determines the amount of information contained in the data and
indicates the most appropriate data summarization and statistical analyses.
• Qualitative data use either the nominal or ordinal scale of measurement. Quantitative data
are obtained using either the interval or ratio scale of measurement.
Nominal: Categorised Without Order
When the data for a variable consist of labels or names used to identify an attribute of the
element, the scale of measurement is considered a nominal scale.
Example: Gender (Male, Female, Other.)

Ordinal: Categorised With Order


The scale of measurement for a variable is considered an ordinal scale if the data exhibit
the properties of nominal data and in addition, the order or rank of the data is meaningful.
Example: Socioeconomic status (low income, middle income, high income)
Interval scale
• The scale of measurement for a variable is an interval scale if the data have all the
properties of ordinal data and the interval between values is expressed in terms of a fixed
unit of measure.
• Example: Temperature
• The location of the zero point is not fixed. Both the zero point and the units of
measurement are arbitrary. (‘zero’ has no meaning)
• It is not meaningful to take ratios of scale values.

Ratio scale
• The scale of measurement for a variable is a ratio scale if the data have all the properties
of interval data and the ratio of two values is meaningful.
• Variables such as distance, height, weight, and time use the ratio scale of measurement.
• It has an absolute zero point. (‘zero’ has a meaning )
• It is meaningful to compute ratios of scale values.
Scale of Measurement
Scale
Nominal Numbers
Assigned 7 8 3
to Runners

Ordinal Rank Order


of Winners Third Second First
place place place
Interval Performance
Rating on a 8.2 9.1 9.6
1 to 10 Scale

Ratio Time to 15.2 14.1 13.4

Finish, in
Seconds
Organizing and Presenting
Data to Convey Meaning:
Tables and Graphs
Sources of Data
Primary Data
• Primary data is a type of data that is collected by researchers directly from main(first-hand)
sources through interviews, surveys, and experiments or observations.
• Customer satisfaction surveys, interviews of scientists, health observation of patients, etc. are
some examples of primary data.
Secondary Data
• Secondary data is the data that have been already collected and is readily available from
other sources.
• Secondary data are data that have been collected for another purpose and where we will use
statistical methods with the Primary Data. It means that after performing statistical
operations on Primary data the results become known as Secondary Data.
• Secondary data are second-hand information that already exists in published or unpublished
forms.
• These information can be obtained from journals, magazines, reports, websites, etc.
• Financial reports, population census, CMIE reports, ProwessIQ database, IMF reports, etc.
Below Data collected from the Students(35) Average Time Spent(No. of Hours) on Social Media
per Month
70 66 60 55 61 63 72
68 60 60 63 60 75 68
59 71 53 75 64 64 52
64 64 68 64 66 67 63
64 70 69 68 63 59 57
When data are collected in original form, they are called raw data. Raw data is not very
meaningful to an audience.
• An array is the arrangement of the values in ascending or descending order.
52 53 55 57 59 59 60
60 60 60 61 63 63 63
63 64 64 64 64 64 64
66 66 67 68 68 68 68
69 70 70 71 72 75 75
Descriptive Statistics
Descriptive statistics consists of methods for organizing, displaying, and describing data by
using tables, graphs, and summary measures.
Types of descriptive statistics:
• Organize the Data
• Tables
• Frequency Distributions
• Relative Frequency Distributions
• Displaying the Data
• Graphs
• Bar Chart or Histogram
• Summarize the Data
• Central Tendency
• Variation
Frequency Distribution
Organize Data:
• Organize Data using frequency distributions(Tables)
• A frequency distribution is the organizing the raw data in table form, using classes (groups)
and frequencies
• Class (group) is a quantitative or qualitative category
• Frequency, f, of a class is the number of data values contained in a specific class.

Below Data collected from the Students(35) Average Time Spent(No. of Hours) on Social Media
per Month from different courses.

70 66 60 55 61 63 72 BBA BBA BBA BBA BBA BBA BBA


68 60 60 63 60 75 68 BBA BBA BBA BBA BBA BBA BBA
59 71 53 75 64 64 52 BBA BBA BSc BSc BSc BTech BTech
64 64 68 64 66 67 63 BTech BTech BTech BTech BTech BTech BTech
64 70 69 68 63 59 57 BTech BCOM BCOM BCOM BCOM BA(Economics BA(Economics
Frequency Distribution for Qualitative Data (Organizing Qualitative Data)
• Qualitative / Categorical frequency distributions - can be used for data that can be placed
in specific categories, such as nominal- or ordinal-level data.
• A frequency distribution for qualitative data lists all categories and the number of elements
that belong to each of the categories.
• Examples – gender, political affiliation, religious affiliation, blood type etc.

BBA BBA BBA BBA BBA BBA BBA


BBA BBA BBA BBA BBA BBA BBA
Course No. of Students
BBA BBA BSc BSc BSc BTech BTech
BBA 16
BTech BTech BTech BTech BTech BTech BTech
BSc 3
BTech BCOM BCOM BCOM BCOM BA(Economics BA(Economics
BTech 10
BCOM 4
BA(Economics) 2
SUM 35
Relative Frequency and Percent Frequency Distributions
• The relative frequency of a class equals the fraction or proportion of observations belonging
to a class. No. of Relative percent
• A relative frequency distribution gives a tabular Course Students Frequency frequency
summary of data showing the relative frequency for BBA 16 0.457143 45.71429
each class. BSc 3 0.085714 8.571429
BTech 10 0.285714 28.57143
• Relative frequency of a class
BCOM 4 0.114286 11.42857
= Frequency of that class BA(Economics) 2 0.057143 5.714286
SUM 35 1 100
Sum of all frequencies

• The percent frequency of a class is the relative frequency multiplied by 100.


• Percentage=(Relative frequency )*100
• A percent frequency distribution summarizes the percent frequency of the data for each
class
The response to a question has three alternatives: A, b, and C. A sample of 120 responses
provides 60 A, 24 b, and 36 C. Show the frequency and relative frequency distributions
Contingency Table
• To describe a single categorical variable, we use frequency tables. To describe the
relationship between two or more categorical variables, we use a special type of table called
a contingency table (a cross-tabulation or "crosstab" for short).
• In a cross-tabulation, the categories of one variable determine the rows of the table, and the
categories of the other variable determine the columns.

Gender
No. of Female No. of Male
Course Students Students Total
BBA 8 8 16
BSc 1 2 3
BTech 6 4 10
BCOM 2 2 4
BA(Economics) 1 1 2
SUM 18 17 35
Frequency Distribution for Quantitative Data
Below Data collected from the Students’ (35) Average Time Spent(No. of Hours) on Social
Media per Month
70 66 60 55 61 63 72
68 60 60 63 60 75 68
59 71 53 75 64 64 52
64 64 68 64 66 67 63
64 70 69 68 63 59 57
When data are collected in original form, they are called raw data. Raw data is not very
meaningful to an audience.
• An array is the arrangement of the values in ascending or descending order.
52 53 55 57 59 59 60
60 60 60 61 63 63 63
63 64 64 64 64 64 64
66 66 67 68 68 68 68
69 70 70 71 72 75 75
Class(X) Frequency
Discrete frequency distribution
52 1
• When a frequency distribution table lists all of the 53 1
individual categories (X values) it is called a Discrete 55 1
frequency distribution 57 1
59 2
• It can be used when the range of values in the data set 60 4
is not large. 61 1
63 4
64 6
66 2
67 1
68 4
69 1
70 2
71 1
72 1
75 2
Total 35
Continuous Frequency Distribution
• Sometimes, however, a set of observations covers a wide range of values. In these situations, a list
of all the X values would be quite long - too long to be a “simple” presentation of the data. To
remedy this situation, a Continuous frequency distribution table is used.
• Continuous frequency distributions: The data must be grouped into classes that are more than one
unit in width. In a grouped table, the X column lists groups of observations, called class intervals,
rather than individual values. X Frequency
• Constructing a Frequency Distribution 52-55 3
• Step 1: Find the highest and lowest value. (75 and 52) 56-59 3
• Step 2: Find the range(difference between highest and 60-63 9
lowest value 75-52=23) 64-67 9
68-71 8
• Step 3: Select the number of classes desired. (6 )
72-75 3
• Step 4: Find the class width by dividing the range by the Total 35
number of classes. (23/6=3.83=4
Relative frequency distribution
• A relative frequency distribution presents frequencies in terms of
percentages.
• Relative frequency of a class
= Frequency of that class
Sum of all frequencies
• Percentage=(Relative frequency )*100
Class Frequency Relative Frequency Percent
52-55 3 0.09 8.57
56-59 3 0.09 8.57
60-63 9 0.26 25.71
64-67 9 0.26 25.71
68-71 8 0.23 22.86
72-75 3 0.09 8.57
Total 35 1 100
Cumulative Frequency Distribution
• The cumulative frequency is the total of frequencies, in which the frequency of the first class
interval is added to the frequency of the second class interval and then the sum is added to
the frequency of the third class interval and so on.
• Generally, the cumulative frequency distribution is used to identify the number of
observations that lie above or below the particular frequency in the provided data set.
Cumulative
Class Frequency Frequency
52-55 3 3
56-59 3 6
The Ogive is a graph of a cumulative frequency 60-63 9 15
64-67 9 24
distribution. cumulative frequency
40 34 35 68-71 8 32
32
35
30 24 72-75 3 35
25
20 15
15
10 6
3
5
0
Statistical Graphs
• A statistical graph or chart is defined as the pictorial representation of statistical data in
graphical form. Statistical graphs are used to represent a set of data to make it easier to
understand and interpret statistical information.
• Bar Graph /Column
• Pie Chart
• Line Chart
• Histogram
• Scatter Plot
• Box Plot
Two Key Questions
1. What type of data are you working with?
• Qualitative
• Quantitative
2. What are you trying to communicate?
• Relationship
• comparison
• distribution
• trending, etc.
• Bar Graph A graph made of bars whose heights represent the frequencies of respective
categories is called a bar graph.
• BAR & COLUMN CHARTS COMMONLY USED FOR:
• Comparing numerical data across categories
• EXAMPLES:
• Total sales by product type
• Population by country
• Revenue by department, by quarter
No. of Students
18
Course No. of Students 16
16

BBA 16 14
12
BSc 3 10
10

BTech 10 8

BCOM 4 6
3
4
4
BA(Economics) 2 2
2

SUM 35 0
BBA BSc BTech BCOM BA(Economics)
Pie chart
• Pie chart A circle divided into portions(slices) that represent the relative frequencies or
percentages of different categories or classes.
• Use pie charts to show proportions of a whole.
• The slice of a pie chart is to show the proportion of parts out of a whole.

COMMONLY USED FOR:


Comparing proportions totalling 100%

EXAMPLES: No. of Students

• Percentage of budget spent by department 6%


• •Proportion of internet users by age range 11%

• •Breakdown of site traffic by source 46%

29%

8%

BBA BSc BTech BCOM BA(Economics)


Histogram
Histogram
• Histogram is the visual representation of the data which can be used to assess the
probability distribution (frequency distribution) of the data. It is a frequency distribution
of data arranged in consecutive and non-overlapping intervals. Histograms are created by
continuous (numerical) data.
• Histogram A histogram is a graph in which classes are marked on the horizontal axis and
the frequencies, relative frequencies, or percentages are marked on the vertical axis. The
frequencies, relative frequencies, or percentages are represented by the heights of the
bars. In a histogram, the bars are drawn adjacent to each other.
• A histogram takes in a series of data and divides the data into a number of bins. It then
plots the frequency data points in each bin (i.e. the interval of points). It is useful in
understanding the count of data ranges.
• A histogram is a chart that displays the shape of a distribution. A histogram looks like a
bar chart but groups values for a continuous measure into ranges, or bins.
Histogram
Commonly Used For:
• Showing the distribution of a continuous data set
• We should use histogram when we need the count of the variable in a plot.

• Examples:
• Frequency of test scores among students Histogarm
• Distribution of population by age group 10
9
9 9
8
8
• Distribution of heights or weights 7
6
5
4
3 3 3
3
2
1
0
52-55 56-59 60-63 64-67 68-71 72-75
Shapes of Histograms
• A histogram can assume any one of a large number of shapes. The most common of these
shapes are
1. Symmetric
2. Skewed
3. Uniform or rectangular
• A symmetric histogram is identical on both sides of its central point.

• A skewed histogram is nonsymmetric. For a skewed histogram, the tail on one side is longer
than the tail on the other side. A skewed-to-the-right histogram has a longer tail on the right
side (see Figure a). A skewed-to-the-left histogram has a longer tail on the left side (see
Figure b).

• A uniform or rectangular histogram has the same frequency for each class.
Scatter Plot
• A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two different
numeric variables. The position of each dot on the horizontal and vertical axis indicates
values for an individual data point.
• Scatter plots are used to observe relationships between variables.
Commonly Used For:
• Exploring Correlations Or Relationships Between Series
Examples:
• Advertisements And Sales
• Study time and Marks
• Positive correlation depicts a rise, and it is seen on the
diagram as data points slope upwards from the lower-left
corner of the chart towards the upper-right.
• Negative correlation depicts a fall, and this is seen on the
chart as data points slope downwards from the upper-left
corner of the chart towards the lower-right.
• Data that is neither positively nor negatively correlated is
considered uncorrelated (null).
Scatter Plot
Study time Marks
20 40
24 55 90
Scatter Plot

46 69 80
70

Marks
62 83 60
50
40
22 27 30
20

37 44 10
0
0 10 20 30 40 50 60 70
Study time
• A side-by-side bar chart is a graphical side-by-side bar
display for depicting multiple bar 9
8
8 8
No. of Female
charts on the same display 7
6
6
Students
5 4
4
3
2
2 2 2 No. of Male
1 1 1
1
0
Students

Stacked bar

• A stacked bar chart is a bar chart 18


16 No. of Male
14
in which each bar is broken into 12 8
Students
10
rectangular segments of a different 8 4 No. of Female
6
Students
color showing the relative 4
2
8
2 6 2
1 2 1
1
frequency of each class in a 0

manner similar to a pie chart.


Line Chart
• A line chart is a graphical representation of (Sales) historical price action that connects a series of
data points with a continuous line. This is the most basic type of chart used in finance. Line charts
can be used on any timeframe.
• Time series is a line plot and it is basically connecting data points with a straight line. It is useful
in understanding the trend over time. It can explain the correlation between points by the trend.
Commonly Used For:
• Visualizing trends over time
• Time Series should be used when single or multiple variables are to be plotted over time.
Examples:
• Stock price by hour
• Average temperature by month
• Profit by quarter

Number of monthly active Facebook users worldwide as of 3rd quarter 2020


• The following data show the method of payment by 16 customers in a supermarket
checkout line. Here, C refers to cash, CK to check, CC to credit card, and D to debit card, and
O stands for other.
C CK CK C CC D O C
CK CC D CC C CK CK CC

a. Construct a frequency distribution table.


b. Calculate the relative frequencies and percentages for all categories.
c. Draw a bar and pie chart for the percentage distribution.

The following data give the total number of iPods sold by a mail order on each of 30 days.
Construct a frequency distribution table. Draw a histogram.
Box and Whisker plot
• A box-and-whisker plot gives a graphic presentation of data using five measures: the median, the
first quartile, the third quartile, and the smallest and the largest values in the data set between the
lower and the upper inner fences.
• The length of the box is equivalent to IQR. It is possible that the data may contain values beyond
Q1 – 1.5 IQR and Q3 + 1.5 IQR. The whisker of the box plot extends till Q1 – 1.5 IQR (or minimum
value) and Q3 + 1.5 IQR (or maximum value); observations beyond these two limits are potential
outliers.
• Commonly Used For:
• Visualizing statistical characteristics across data series
• EXAMPLES:
• Comparing historical annual rainfall across cities
• Analyzing distributions of values and identifying outliers
• Comparing mean and median height/weight by country

25, 28, 29, 29, 30, 34, 35, 35, 37, 38


Thank You

You might also like