0% found this document useful (0 votes)
29 views36 pages

Summarising and Analysing Data

This document provides an overview of key concepts for summarizing and analyzing data, including: - It defines big data and discusses how data has evolved due to technology, social media, and the huge amounts of unstructured data now available. - It explains the differences between ungrouped and grouped data and describes common measures used to summarize both types, including mean, median, mode, range, and standard deviation. - It introduces concepts like expected value, coefficient of variation, and the standard normal distribution, which are important for interpreting and making decisions based on analyzed data.

Uploaded by

Romail Qazi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views36 pages

Summarising and Analysing Data

This document provides an overview of key concepts for summarizing and analyzing data, including: - It defines big data and discusses how data has evolved due to technology, social media, and the huge amounts of unstructured data now available. - It explains the differences between ungrouped and grouped data and describes common measures used to summarize both types, including mean, median, mode, range, and standard deviation. - It introduces concepts like expected value, coefficient of variation, and the standard normal distribution, which are important for interpreting and making decisions based on analyzed data.

Uploaded by

Romail Qazi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Summarising And

Analysing Data
This chapter covers:

 BIG DATA

 UNGROUPED DATA – MEAN, MODE & MEDIAN

 GROUPED DATA – MEAN

 MEASURES OF DISPERSION – VARIATION & DEVIATION

 COEFFICIENT OF VARIATION

 EXPECTED VALUE & DECISION MAKING

 STANDARD NORMAL DISTRIBUTION GRAPH & TABLE

 INTERPRETATION OF STANDARD NORMAL DISTRIBUTION


BIG DATA
A. Evolution Of Data

B. What is Big Data

C. Big Data As An Opportunity


A. Evolution Of Data
1. Technology

2. Social Media

3. Data Evolved to Big Data


1. Technology

Technology has great impact on big data. In the past


technology was not much involved in business operations.
For example,

Telephone Mobiles (Android/Windows)

Computers Clouds
2. Social Media
It is the most important factor in the evolution of big data.

• Face book

• Instagram

• Tweeter, etc.
3. Data Evolved to Big Data
Huge amount of unstructured data generated in the forms
comments, images and videos.

Other factors;
For example, Amazon, Flip cards, etc.
B. What is big data?
Definition:
“The data sets with such huge volume, high
velocity of updating and in a great variety, which
requires specific technology to make value or use from
it.”

From the above we have 4 Vs:

• Volume (Continuous increase in Quantity of data)

• Variety (Data coming from multiple sources)

• Velocity (Speed of accumulation of data)

• Value (Benefit in business growth and development)


C. Big Data As An Opportunity
Big data can benefits in many ways. Like:

 Cost reduction (Effective storage system)

 Faster & better decision Making (Quick analysis, risk


mitigation, more informed & updated decisions)

 Improved Services & Products (Knowledge of customer


preferences and interest for re-engineering and
development of products and services)

 New Revenue Streams (development of trends and new


revenue channels)
Data, Ungrouped Data & Grouped Data
Data has been and will be an important part for
organisations to make their decisions. Since the customers
are getting more aware about their purchase decisions, it is
extremely important for businesses to make use of data
available and operate accordingly to stay competitive in
the current economic environment.

Statistics is the branch of Mathematics which deals with the


calculation, organisation and interpreting of data, which is
numerical.
The Data that can be worked upon can be of two types essentially:
1. Ungrouped Data
2. Grouped Data

1. Ungrouped Data: Ungrouped Data can be defined as data


which is not in a certain sequence neither grouped nor
classified into smaller sub groups.

2. Grouped data: Grouped Data can be seen as data which


is not entirely pure rather some sort of classification has been
done with it.

However either Grouped or Ungrouped data is still raw facts and


figures, and if they are converted into being useful for the user then
it will no longer be data rather information. Ungrouped Data can
be converted into grouped data by classifying it and by analysing
the frequencies within it so that it can be rearranged accordingly.
Following are some methods which are applied over
ungrouped data:

Mean of Ungrouped Data


Mean simply is the arithmetic average of the whole population
of data or average figure of the data sets which is
representative of the entire data.

The formula calculate mean = Sum of Observations / Total


Number of Observations

If we suppose ‘n’ as the number of observations and 1, 2, 3…


as the observations.

Then the formula would be = ( 1+ 2+ 3 ….+ )/n


Mean ( `)= /
Example 1:
Consider the following data:

22, 34, 56, 12, 18, 42, 50, 31

Solution:
Median of Ungrouped Data
Median of any population of data / data set is the mid
value. Simply it is the value which comes right in the middle
of the ungrouped data. It is found out by arranging the
whole data set into ascending order first and then dividing it
into equal parts & selecting the one which comes in the
middle.

a. When n = odd
Median = Mid value from the observations list (after
arranging the list in ascending sequence order)

b. When n = even
Median = taking average of the mid two observations list
(after arranging the list in ascending sequence order)
Example 2:
Calculate Median of the sets of Data
N = odd
Data: 28, 22, 34, 56, 12, 18, 42, 50, 31

Solution:
Example 3:
Calculate Median of the sets of Data
N = even
Data: 22, 34, 56, 12, 18, 42, 50, 31

Solution:
Mode of Ungrouped Data

Mode can easily be defined as the value which occurs most


frequently in the population of data / data set. Simply put it is
the most frequently occurring value in the entire data. It is
denoted as Z.

Mode = Value with Highest Frequency

To find Mode in data set one has to give a count of frequency


of every value and choose the one with the highest count.
Example 4:
Consider the following data:
22, 34, 56, 12, 12, 18, 42, 12, 50, 31, 42, 18

Solution:

Observations (x)

Frequency (f)
Mean of Grouped Data

Mean is the average of the entire population / data set and


in the case of grouped data it can be calculated in the
following way:

1. Direct method
x = observation
n = number of observations
f = frequency of value

Mean = ( 1 1+ 2 2 + 3 3 ….+ /( 1+ 2+ 3...+ )) OR

Mean x` = x/

Mean = (Sum of the product of observations & frequencies) /


(Sum of the frequencies)
Example 5:
Consider the following grouped data:
Scores (x) Frequency (f) fx
0 20
1 40
2 20
3 0
4 36
5 0
6 4
Total

Solution:
Measures of Dispersion
In Statistics when analysing data sets it is important to
understand more than just the values of central tendency.
Along with this it is equally important to see how well the
data is spread and just looking at the mean of a dataset is
not sufficient for analysis purposes as it is possible that the
mean of two datasets are equal but the data spread is
different.

Spread or Dispersion can be measured in a variety of ways


as follows:
a. Range
b. Standard Deviation
c. Coefficient of variation
a. Range
This is the simplest method to calculate the spread
of any given data. It is the difference of the highest
value with the lowest value in the dataset.

Example 7: Consider the data: 22, 45, 44, 25, 65, 80, 46, 59, 35,
29.

Solution:
b. Standard Deviation
When any dataset is given it is essential before considering
the data to understand the variability of the values. Since
mean is a single figure it must be understood how much
difference is there in the mean and the other values existing
in the population data.

For ungrouped data:

Standard deviation = (x – x’)2 /n

For grouped data:

Standard deviation = {fx2/f – (fx/f)2}


Example 9:
Calculate the variance and the standard deviation of
the following data:
25, 18, 29, 34, 45, 40, 38
Solution:
Observations Deviation of Squared
x Observations from mean Deviations
(xi – X`) (xi – X`)2
25
18
29
34
45
40
38
229
Example (grouped data): The hours of overtime worked in
a particular quarter by the 60 employees of ABC Co are as
follows:
Hours Frequency
0 - 10 3
10 - 20 6
20 - 30 11
30 - 40 15
40 - 50 12
50 - 60 7
60 - 70 6
60
Required: Calculate the mean and standard deviation.
c. Coefficient of Variation

Coefficient of variance is the extent of variation with respect to


the mean of the population. It is usually written as a percentage.
It is calculated using the following formula:

= / ` x 100 or / ` x 100

Coefficient of variation is calculated by dividing the standard


deviation by the mean of the dataset. As it is usually written as a
percentage it is multiplied by 100%.
Example 11:
A data set has a mean of 52 and the population
standard deviation is calculated to be 8. What is the
coefficient of variation?

Solution:
EXPECTED VALUE
Expected Values can be simply understood to be long
term average of a certain data population. This means
that if certain observations are done over a long period of
time the following mean would be produced. These are
then used to make decisions according to the expected
value. Even though the expected value might not actually
be the outcome but it gives a good idea about the value
that can be expected nearest to it using the probabilities.

It is represented by E(X) or

Expected Value = E(X) = = ( )

It can be calculated by taking the sum of the product of


variable (x) and its probability of occurrence ( P(x) ).
Example 12:
A Traffic police department of a city has
determined the following statistics for road accidents on a
certain road per day and more than 3 accidents on this
road were not reported in the data from which the
observations were taken:
Number of road accidents (x) Probability (P(x)) (xP(x))
0 70% = 0.7
1 15% = 0.15
2 10% = 0.1
3 5% = 0.05
Total:
Standard Normal Distribution
A standard normal distribution is a
continuous probability distribution for a random variable x.
The graph of a normal distribution is called the normal curve,
which has all of the following properties:

Properties of a Standard Normal Distribution

• The mean, median, and mode are equal.


• The normal curve is bell-shaped and is symmetric about the
mean.
• The total area under the curve is equal to one.
• The normal curve approaches, but never touches, the x-axis.
• Between µ − σ and µ + σ the graph is concave down and
elsewhere the graph is concave up. The points at which the
graph changes concavity are called inflection points.
Standard Normal Distribution Graph:

It is established by calculating the standard deviation that:

• 68% of the values will fall within 1± standard deviation of the


mean.
• 95% of the values will fall within 2± standard deviation of the
mean.
• 99.7% of the values will fall within 3± standard deviation of the
mean.
If we look at the standard normal distribution graph:

99.7% of data

68% of data

95% of data
Standard Normal Distribution Graph

Z = -3 Z = -2 Z = -1 Z=0 Z=1 Z=2 Z=3


Interpreting Standard Normal Distribution Graph
Standard Normal Distribution graph represents the data in the
form of a curve as shown above and the reflection points are
denoted by Z. The value of Z in the middle of the curve or at
the mean is said to be 0 (Z = 0). The point to its left is where Z =
1 and on left of mean is Z = -1 and so on.

These Z-scores are essential when understanding Standard


Normal Distribution Graphs. Values of any normal distribution
can be converted into Z-scores in the following way:

1. Deduct the Mean from the Value


2. Divide the Result with the standard deviation to get the z-
score

This would also mean that if the z-score of a certain value is


known then it can be calculated by going reverse in the steps
above.
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359

0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753

0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141

0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517

0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879

0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224

0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549

0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852

0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133

0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389

1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621

1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830

1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015

1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177

1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319

1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441

1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545

1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633

1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706

1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767

2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857

2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890

2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916

2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936

2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952

2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964

2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974

2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981

2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986

3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

You might also like