Statistical Description of Data
Statistical Description of Data
Description of
Data
Statistical Description of Data
Introduction to Statistics: -
History
Latin word → Status
Italian word → Statista
German word → Statistik
French word → Statistique
Definition of Statistics
Singular Plural
Scientific method used for Qualitative and quantitative
collecting, analysing data to data, collected with a view of
draw statistical inferences having statistical analysis
(conclusion)
Application of Statistics
Note: Statistics is mainly used/helpful in decision making.
Commerce and Industry → Data on previous sales, raw material, wages and salary etc.
Limitation of statistics: -
1) Statistics deal with the aggregates not individual observation.
2) Statistics is concerned with quantitative data. However, qualitative data also can be
converted to quantitative data by providing numerical description.
3) Future projection of sales, production price, quantity etc. are possible under a
specific set of conditions. If any of the observation is violated, projection are likely
to be inaccurate.
Variable:
It is a measurable quantity.
A variable may be either discrete or continuous.
A variable, on the other hand, is known to be continuous if it can assume any value from
a given interval. Eg. Height, Weight
Classification of Data
1) Chronological or Temporal or Time series data: Data arranged on the basis of time.
E.g. the number of students appeared for CA final for the last twenty years.
2) Geographical or Spatial series data: Data arranged on the basis of region/area. E.g.
Student appeared for CA Final in the year 2022 in accordance with different states.
3) Qualitative or Ordinal data: Data arranged/classified on the basis of
attribute/quality. E.g. Nationality, Gender, Smoking habit.
4) Quantitative or Ordinal data: Data are classified in respect of a variable. E.g. Height,
Weight, Profits, Salaries etc.
Collection of Data
Data
3) Observation method
2) Covers all the important aspects of the problem under consideration and sending
them to respondent data requirement.
Postal E-mail
4) Wide coverage quickest method
5) Non-response is maximum
Observation Method
1) Data is collected by direct observation or using instrument.
3) More accuracy.
One must apply his intelligence, patience and experience while scrutinizing the given
information.
Example, If the data for population, area and density for some places are given, then we
may verify whether they are internally consistent by examining whether the relation.
Population
Density =
Area
Sources of Secondary Data
1) Textual Presentation:-
Presentation of data with the help of paragraph or no. of paragraphs.
E.g.
‘In 2009, out of a total of five thousand workers of Roy Enamel Factory, four thousand and two
hundred were members of a Trade Union. The number of female workers was twenty per cent of
the total workers out of which thirty per cent were members of the Trade Union.
In 2010, the number of workers belonging to the trade union was increased by twenty per cent as
compared to 2009 of which four thousand and two hundred were male. The number of workers
not belonging to trade union was nine hundred and fifty of which four hundred and fifty were
females.’
Merit
1) Simplicity
2) Layman can present data by this method.
3) Taken as first step towards the other method of presentation.
Demerit
1) Dull/Boring
2) Monotonous
3) Comparison on b/w different observation is not possible in two method.
2) Tabular presentation or Tabulation:-
Tabulation may be defined as systematic presentation of data with the help of a
statistical table having a number of rows and columns and complete with reference
number, title, description of rows as well as columns and foot notes, if any.
Parts of a Table
Parts or elements of a table vary from table to table depending upon the nature of data
and purpose of tabulation. Yet some points are common. These are:
1) Table number is required for the identification of a table particularly when there are
more than one tables in a particular analysis.
2) Title of the table gives the indication of the type of information contained in the body
of the table.
3) Head note, also called prefactory note, is written just below the title. It shows
contents and unit of measurement like (rupees crore) or (lakh tonnes) or (thousand
bales).
4) Stubs are used to designate rows. They appear on the left hand column of the table.
5) Caption is the upper part of the table, describing the columns and sub-columns, if any.
6) Box-head is the entire upper part of the table which includes columns and sub-
column numbers, unit(s) of measurement along with caption.
7) Main body of the table, also called field of the table, is its most important and bulky
part. It contains the relevant numerical information about which a hint is already
contained in the title of the table.
8) Foot Note is a qualifying statement put just below the table (at the bottom). Its
purpose is to caution about the limitations of the data or certain omissions.
9) Source indicates the sources from where data (Primary Data or Secondary Data)
has been collected.
3) Diagrammatic representation of data:-
ii) Diagrammatic representation can be used for both the educated section and
uneducated section of the society.
iii) Any hidden trend present in the given data can be noticed only in this mode of
representation.
The profits in lakhs of Rupees of an industrial house for 2009, 2010, 2011, 2012,
2013, 2014, and 2015 are 5, 8, 9, 6, 12, 15 and 24 respectively. Represent these data
using a suitable diagram.
Solution
Question
40 40
No. of Employees
30 30
Income
20 20
10 10
The profits in lakhs of Rupees of an industrial house for 2009, 2010, 2011, 2012, 2013, 2014,
and 2015 are 5, 8, 9, 6, 12, 15 and 24 respectively. Represent these data using a suitable
diagram.
Solution
Bar Diagram
Question
Solution
Subject Number of Students
2011-12 2012-13
Statistic 25 30
Economics 40 42
History 35 28
3) Pie Diagrams :-
a) It is also known as angular diagram.
b) It is used to represent percentage break downs of the given data.
c) Percentage can then be converted into angles by the formula.
ii) When tabulation is done in respect of a discrete random variable, it is known as Discrete or
Ungrouped or simple Frequency Distribution.
iii) In case the characteristic under consideration is a continuous variable, such a classification is
termed as Grouped Frequency Distribution.
Question
A review of the first 30 pages of a statistics book reveals the following printing
mistakes:
Following are the weights in kgs. of 36 BBA students of St. Xavier’s College.
where D is the difference between the LCL of the next class interval and the UCL of the given
class interval. For the data presented in table 10.5, LCB of the first class interval
3) Mid-point or Mid-value or class mark
Corresponding to a class interval, this may be defined as the total of the two class limits or class
boundaries to be divided by 2. Thus, we have
Referring to the distribution of weight of BBA students, the mid-points for the first two class
intervals are
4) Width or size of a class interval
The width of a class interval may be defined as the difference between the UCB and
the LCB of that class interval. For the distribution of weights of BBA students, C, the
class length or width is 48.50 kgs. – 43.50 kgs. = 5 kgs. for the first class interval. For
the other class intervals also, C remains same.
5) Cumulative Frequency
The cumulative frequency corresponding to a value for a discrete variable and
corresponding to a class boundary for a continuous variable may be defined as the number
of observations less than the value or less than or equal to the class boundary. This
definition refers to the less than cumulative frequency. We can define more than cumulative
frequency in a similar manner. Both types of cumulative frequencies are shown in the
following table.
6) Frequency density of a class interval
It may be defined as the ratio of the frequency of that class interval to the corresponding
class length. The frequency densities for the first two class intervals of the frequency
distribution ofweights of BBA students are 3/5 and 4/5 i.e. 0.60 and 0.80 respectively.
7) Relative frequency and percentage frequency of a class interval
Relative frequency of a class interval may be defined as the ratio of the class frequency to
the total frequency. Percentage frequency of a class interval may be defined as the ratio of
class frequency to the total frequency, expressed as a percentage. For the last example, the
relative frequencies for the first two class intervals are 3/36 and 4/36 respectively and the
percentage frequencies are 300/36 and 400/36 respectively. It is quite obvious that
whereas the relative frequencies add up to unity, the percentage frequencies add up to one
hundred.
Graphical Representation of Frequency Distribution
1) Histogram or Area diagram;
This is a very convenient way to represent a frequency distribution.
It shows a set of class interval by its width and the frequency by its Height.
This is exclusively used for showing frequency distribution of quantitative data that are
continuous in nature.
Area of a rectangle shows the proportion of the class frequency in the total.
Note:
In case of unequal class, First convert it in to exclusive series and then draw histogram.
If mid point is given, then take the difference between mid points of two consecutive
frequencies and divided it by two, we well get height (h). Now, to make it continuous series
subtract h from mid point (will get lower limit) and add h to the mid point (will get upper limit)
and then construct histogram.
Frequency Polygon
A frequency curve can be regarded as a limiting form of frequency polygon.
If the points, obtained in case of frequency polygon are joined with the help of a smooth curve,
we get a frequency curve
3) Ogives or cumulative Frequency graphs.
By plotting cumulative frequency against the respective class boundary, we get ogives.
Cumulative Cumulative
frequencies are frequencies are
plotted with upper plotted with lower
limit of class intervals. limit of class
intervals.
Note: It is generally assumed class preceding the first class has the frequency to be zero.
Types of Frequency Curve
1) Bell-shaped curve;
Most of the commonly used distributions provide bell-shaped curve, which, as suggested by
the name, looks almost like a bell.
On a bell-shaped curve, the frequency, starting from a rather low value, gradually reaches
the maximum value, somewhere near the central part and then gradually decreases to
reach its lowest value at the other extremity.
Example: Profit, Hight, Weight.
Bell-shaped Curve
2) U-shaped curve;
A U-shaped curve, the frequency is minimum near the central part and the frequency
slowly but steadily reaches its maximum at the two extremities.
Example: The distribution of Kolkata bound commuters belongs to this type of curve as
there are maximum number of commuters during the peak hours in the morning and in the
evening.
U-shaped Curve
3) J-shaped curve;
J-shaped curve starts with a minimum frequency and then gradually reaches its maximum
frequency at the other extremity.
Example: The distribution of commuters coming to Kolkata from the early morning hour to
peak morning hour follows such a distribution.
J-shaped Curve
4) Mixed curve.
The combination of all the frequency curves is known as mixed curve.
Mixed Curve