Understanding Data
Understanding Data
C
7 Understanding Data
In this Chapter
»» Introduction to Data
»» Data Collection
»» Data Storage
7.1 Introduction to Data
»» Data Processing Many a time, people take decisions based on
certain data or information. For example, while
»» Statistical Techniques for
Data Processing choosing a college for getting admission, one looks
at placement data of previous years of that college,
educational qualification and experience of the
faculty members, laboratory and hostel facilities,
fees, etc. So we can say that identification of a
college is based on various data and their analysis.
Governments systematically collect and record
data about the population through a process
called census. Census data contains valuable
information which are helpful is planning and
formulating policies. Likewise, the coaching staff
of a sports team analyses previous performances
of opponent teams for making strategies. Banks
maintain data about the customers, their account
details and transactions. All these examples
highlight the need of data in various fields. Data
are indeed crucial for decision making.
Rationalised 2023-24
Rationalised 2023-24
Understanding Data 99
Rationalised 2023-24
Rationalised 2023-24
Rationalised 2023-24
Rationalised 2023-24
A website
A website handling
handling online
online filling
filling of
of student
student details
details for
for a
a competitive
competitive examination
examination and
and generating
generating admit
admit card
card
ATM PIN number, account type, Checking for valid PIN number,
account number, card number, existing bank balance, if satisfied, Currency notes, printed slip with
ATM location from where money then deduction of amount from that transaction details
was withdrawn, date and time, and account and counting of rupees and
amount to be withdrawn. initiate printing of receipt
Issue
Issue of
of train
train ticket
ticket
Journey start and end stations, Verify login details and check
date of journey, number of tickets availability of berth in that class. If
required, class of travel payment done, issue tickets and Generate ticket with berth and
(Sleeper/AC/other), berth deduct that number from the total coach number, or issue ticket with
preference (if any), passenger available tickets on that coach. a waiting list number
name(s) and age(s), mobile and Allocate PNR number and berths or
email id, payment related details, generate a waiting number for that
etc. ticket.
Rationalised 2023-24
Rationalised 2023-24
(C) Mode
Value that appears most number of times in the given
data of an attribute/variable is called Mode. It is
computed on the basis of frequency of occurrence of
distinct values in the given data. A data set has no mode
if each value occurs only once. There may be multiple
modes in the data if more than one values have same
highest frequency. Mode can be found for numeric as
well as non-numeric data.
Example 7.3
In the list of height of students, mode is 110 as its
frequency of occurrence in the list is 3, which is larger
than the frequency of rest of the values.
7.5.2 Measures of Variability
The measures of variability refer to the spread or variation
of the values around the mean. They are also called
measures of dispersion that indicate the degree of diversity
in a data set. They also indicate difference within the group.
Two different data sets can have the same mean, median
or mode but completely different levels of dispersion, or
vice versa. Common measures of dispersion or variability
are Range and Standard Deviation.
(A) Range
It is the difference between maximum and minimum
values of the data (the largest value minus the
smallest value). Range can be calculated only for
numerical data. It is a measure of dispersion and
tells about coverage/spread of data values. For
Rationalised 2023-24
Example 7.5
Let us compute the standard deviation of the height
of nine students that we used while calculating
Mean. The Mean (x) was calculated to be 101.33 cm.
Subtract each value from the mean and take square
of that value. Dividing the sum of square values by
total number of values and taking its square not
gives the standard deviation in data. See Table 7.3
for details.
Table 7.3 Standard deviation of attendance of 9 students
_ _
Height (x) in cm x_x (x _ x )2
90 -11.33 128.37 n
(X i − X )2
102 0.67 0.36 = i =1
n
110 8.67 75.17
115 13.67 186.87
Rationalised 2023-24
n
110 8.67 75.17
_ _2
n=9 ∑(x-x) = 0.03 ∑(x-x) = 938.00
_
x =101.33
Teacher wants to know about the average performance of the whole class in
a test.
Compare height of residents of two cities
Find the popular color for car after surveying the car owners of a small city.
Summary
• Data refer to unorganised facts that can be
processed to generate meaningful result or
information.
• Data can be structured or unstructured.
• Hard Disk, SSD, CD/DVD, Pen Drive, Memory
Card, etc. are some of the commonly used storage
devices.
Rationalised 2023-24
Exercise
1. Identify data required to be maintained to perform the
following services:
a) Declare exam results and print e-certificates
b) Register participants in an exhibition and issue
biometric ID cards
c) To search for an image by a search engine
d) To book an OPD appointment with a hospital in a
specific department
2. A school having 500 students wants to identify
beneficiaries of the merit-cum means scholarship,
achieving more than 75% for two consecutive years
and having family income less than 5 lakh per annum.
Briefly describe data processing steps to be taken by the
to beneficial prepare the list of school.
3. A bank ‘xyz’ wants to know about its popularity among
the residents of a city ‘ABC’ on the basis of number of
bank accounts each family has and the average monthly
account balance of each person. Briefly describe the
steps to be taken for collecting data and what results
can be checked through processing of the collected data.
Rationalised 2023-24
Rationalised 2023-24
Rationalised 2023-24