0% found this document useful (0 votes)
50 views63 pages

Statistical Description of Data

Uploaded by

ait770196
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views63 pages

Statistical Description of Data

Uploaded by

ait770196
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Statistical

Description of
Data
Statistical Description of Data
Introduction to Statistics: -
History
Latin word → Status
Italian word → Statista
German word → Statistik
French word → Statistique
Definition of Statistics

Singular Plural
Scientific method used for Qualitative and quantitative
collecting, analysing data to data, collected with a view of
draw statistical inferences having statistical analysis
(conclusion)
Application of Statistics
Note: Statistics is mainly used/helpful in decision making.

Economics → Index number, Demand Analysis, Time Series etc.

Business Management → Sampling

Commerce and Industry → Data on previous sales, raw material, wages and salary etc.
Limitation of statistics: -
1) Statistics deal with the aggregates not individual observation.

2) Statistics is concerned with quantitative data. However, qualitative data also can be
converted to quantitative data by providing numerical description.

3) Future projection of sales, production price, quantity etc. are possible under a
specific set of conditions. If any of the observation is violated, projection are likely
to be inaccurate.

4) The theory of statistical inferences is built upon random sampling


Collection of Data
We may define ‘DATA’ as quantitative information about some particular characteristic(s)
under consideration.

Variable:
It is a measurable quantity.
A variable may be either discrete or continuous.

When a variable assumes a finite or a countably infinite number of isolated values, it is


known as a discrete variable. Examples of discrete variables may be found in the
number of petals in a flower, the number of misprints a book contains, the number of road
accidents in a particular locality and so on.

A variable, on the other hand, is known to be continuous if it can assume any value from
a given interval. Eg. Height, Weight
Classification of Data
1) Chronological or Temporal or Time series data: Data arranged on the basis of time.
E.g. the number of students appeared for CA final for the last twenty years.
2) Geographical or Spatial series data: Data arranged on the basis of region/area. E.g.
Student appeared for CA Final in the year 2022 in accordance with different states.
3) Qualitative or Ordinal data: Data arranged/classified on the basis of
attribute/quality. E.g. Nationality, Gender, Smoking habit.
4) Quantitative or Ordinal data: Data are classified in respect of a variable. E.g. Height,
Weight, Profits, Salaries etc.
Collection of Data

Data

Primary Data Secondary Data


Data collected for the first Data already collected, are used
time by an investigator or by a different person or agency
agency. E.g. Census data/Health records
E.g. Survey/Interviews etc. etc.
Collection of Primary Data
1) Interview Method

2) Mailed Questionnaire Method

3) Observation method

4) Questionaries filled and sent by enumerator


Interview Method

Personal Interview Indirect Interview Telephone Interview


1) Data is collected directly from 1) Data is collected by 1) Data is collected over phone
the respondents contacting associated 2) E.g. Telephone Interview
2) E.g. Natural Calamity Door to person. Questions.
Door Survey. 2) E.g. Rail Accident. 3) Quick and non-expensive
3) Slow and expensive method. 3) Slow and expensive method.
4) Low coverage method. 4) High coverage
5) High accuracy 4) Low coverage 5) Low accuracy
6) High response. 5) Medium accuracy 6) Low response
6) Medium response
Mailed Questionnaire Method
1) Well drafted and soundly sequenced questionnaire.

2) Covers all the important aspects of the problem under consideration and sending
them to respondent data requirement.

3) Necessary guidelines for filling up the questionnaire

Postal E-mail
4) Wide coverage quickest method

5) Non-response is maximum
Observation Method
1) Data is collected by direct observation or using instrument.

2) E.g. Data on height of a group of students.

3) More accuracy.

4) Time consuming and laborious.

5) Best method of data collection.


Questionaries filled and sent by enumerators
1) Enumerator – A person who directly indirect with the respondent and fills the
questionnaire.

2) It is generally used in surveys or census.


Scrutiny of Data
The statistical analyses are made only on the basis of data, it is necessary to check
whether the data under consideration are accurate as well as consistence.

One must apply his intelligence, patience and experience while scrutinizing the given
information.

Example, If the data for population, area and density for some places are given, then we
may verify whether they are internally consistent by examining whether the relation.

Population
Density =
Area
Sources of Secondary Data

International Sources Government Sources Private and Others


WHO (Reliable Sources) Quasi Research Papers or
IMF In India → CSO, NSSU, Government unpublished sources.
ILO Regulator – RBI SEBI, ISI, NCERT
World RERA, IRADA
Bank
Data Organisation
Presentation of Data
Present the data in a neat and condensed form highlighting the essential features of the
data.

Classification or Organisation of Data


It may be defined as the process of arranging data on the basis of the characteristics
under consideration into number of groups or classes according to the similarities of the
observation.
Objectives of Classification of Data
1) Puts data in a neat, precise (only important data) and condensed form.
2) Helps in comparison
3) Statistical Analysis
Classification of Data
1) Chronological or Temporal or Time series data: Data arranged on the basis of time.
E.g. the number of students appeared for CA final for the last twenty years.
2) Geographical or Spatial series data: Data arranged on the basis of region/area. E.g.
Student appeared for CA Final in the year 202 in accordance with different states.
3) Qualitative or Ordinal data: Data arranged/classified on the basis of attribute/quality.
E.g. Nationality, Gender, Smoking habit.
4) Quantitative or Ordinal data: Data are classified in respect of a variable. E.g. Height,
Weight, Profits, Salaries etc.
[Note: Data are further classified as frequency data and non-frequency data.]

[The qualitative and quantitative data belong to frequency group].


[Time series data and geographical data belong to the non-frequency group].
Mode of Presentation of Data

1) Textual Presentation:-
Presentation of data with the help of paragraph or no. of paragraphs.

E.g.
‘In 2009, out of a total of five thousand workers of Roy Enamel Factory, four thousand and two
hundred were members of a Trade Union. The number of female workers was twenty per cent of
the total workers out of which thirty per cent were members of the Trade Union.

In 2010, the number of workers belonging to the trade union was increased by twenty per cent as
compared to 2009 of which four thousand and two hundred were male. The number of workers
not belonging to trade union was nine hundred and fifty of which four hundred and fifty were
females.’
Merit
1) Simplicity
2) Layman can present data by this method.
3) Taken as first step towards the other method of presentation.

Demerit
1) Dull/Boring
2) Monotonous
3) Comparison on b/w different observation is not possible in two method.
2) Tabular presentation or Tabulation:-
Tabulation may be defined as systematic presentation of data with the help of a
statistical table having a number of rows and columns and complete with reference
number, title, description of rows as well as columns and foot notes, if any.
Parts of a Table
Parts or elements of a table vary from table to table depending upon the nature of data
and purpose of tabulation. Yet some points are common. These are:

1) Table number is required for the identification of a table particularly when there are
more than one tables in a particular analysis.

2) Title of the table gives the indication of the type of information contained in the body
of the table.

3) Head note, also called prefactory note, is written just below the title. It shows
contents and unit of measurement like (rupees crore) or (lakh tonnes) or (thousand
bales).

4) Stubs are used to designate rows. They appear on the left hand column of the table.

5) Caption is the upper part of the table, describing the columns and sub-columns, if any.
6) Box-head is the entire upper part of the table which includes columns and sub-
column numbers, unit(s) of measurement along with caption.

7) Main body of the table, also called field of the table, is its most important and bulky
part. It contains the relevant numerical information about which a hint is already
contained in the title of the table.

8) Foot Note is a qualifying statement put just below the table (at the bottom). Its
purpose is to caution about the limitations of the data or certain omissions.

9) Source indicates the sources from where data (Primary Data or Secondary Data)
has been collected.
3) Diagrammatic representation of data:-

i) Attractive representation of statistical data is provided by charts, diagrams and


pictures.

ii) Diagrammatic representation can be used for both the educated section and
uneducated section of the society.

iii) Any hidden trend present in the given data can be noticed only in this mode of
representation.

iv) Compared to tabulation, this is less accurate. So if there is a priority for


accuracy, we have to recommend tabulation.
Types of Diagrams
1) Line Diagrams :-
i) Along the abscissa, we take the independent variable (x or time) and along the
ordinate the dependent variable (y or production related to time).
ii) After plotting the pints, they are joined by a scale, which represents a line
diagram.
iii) When the time series exhibit a wide range of fluctuations, we may think of
logarithmic or ratio chart where Log yt and not yt is plotted against t.
iv) We use Multiple line chart for representing two or more related time series
data expressed in the same unit.
v) We use multiple – axis chart for representing two or more data expressed in
the different units.
Question

The profits in lakhs of Rupees of an industrial house for 2009, 2010, 2011, 2012,
2013, 2014, and 2015 are 5, 8, 9, 6, 12, 15 and 24 respectively. Represent these data
using a suitable diagram.

Solution
Question

The production of wheat and rice of a region are given below :

Represent this information using a suitable diagram.


Solution
Multiple Line Chart
Logarithmic Scale:-
X (Time) Y (Production)
1999 1
2002 10
2006 100
2008 1000
2010 10000
Multiple Line Chart
X (Time) Y1 (Income) Y2 (No. of Emp.)
2019 10 5
2020 20 10
2021 30 15
2022 40 20

40 40

No. of Employees
30 30

Income
20 20

10 10

0 2019 2020 2021 2022 0


2) Bar Diagrams :-
a) Bars i.e. rectangles of equal width and usually of varying lengths are
drawn either horizontally or vertically.
b) We consider Multiple or Grouped Bar diagrams to compare related
series.
c) Component or sub-divided Bar diagrams are applied for representing
data divided into a number of components.
d) We use Divided Bar charts or Percentage Bar diagrams for comparing
different components of a variable and also the relating of the
components to the whole.
Question

The profits in lakhs of Rupees of an industrial house for 2009, 2010, 2011, 2012, 2013, 2014,
and 2015 are 5, 8, 9, 6, 12, 15 and 24 respectively. Represent these data using a suitable
diagram.
Solution
Bar Diagram
Question

The production of wheat and rice of a region are given below :

Represent this information using a suitable diagram.


Solution
Multiple or grouped Bar Diagram
Component or sub-divided Bar diagrams :-
Question
Present the following data by sub-divided bar diagram.
Subject Number of Students
2011-12 2012-13
Statistic 25 30
Economics 40 42
History 35 28

Solution
Subject Number of Students
2011-12 2012-13
Statistic 25 30
Economics 40 42
History 35 28
3) Pie Diagrams :-
a) It is also known as angular diagram.
b) It is used to represent percentage break downs of the given data.
c) Percentage can then be converted into angles by the formula.

segment value × 360°


Segment Angle =
Total Value
Question

Draw an appropriate diagram with a view to represent the following data :


Solution
Pie chart or divided bar chart would be the ideal diagram to represent this data.
We consider Pie chart.
Computation for drawing Pie chart
Pie chart showing the distribution of Revenue
Frequency Distribution
i) A frequency distribution may be defined as a tabular representation of statistical data, usually
in an ascending order, relating to a measurable characteristic according to individual value or
a group of values of the characteristic under study.

ii) When tabulation is done in respect of a discrete random variable, it is known as Discrete or
Ungrouped or simple Frequency Distribution.

iii) In case the characteristic under consideration is a continuous variable, such a classification is
termed as Grouped Frequency Distribution.
Question

A review of the first 30 pages of a statistics book reveals the following printing
mistakes:

Make a frequency distribution of printing mistakes.


Solution
Pie chart or divided bar chart would be the ideal diagram to represent this data.
We consider Pie chart.
Since x, the printing mistakes, is a discrete variable, x can assume seven values 0,
1, 2, 3, 4, 5 and 6. Thus we have 7 classes, each class comprising a single value.

Frequency Distribution of the number of printing mistakes of the first 30 pages of


a book.
Question

Following are the weights in kgs. of 36 BBA students of St. Xavier’s College.

Construct a frequency distribution of weights, taking class length as 5.


Solution
Some important terms associated with a frequency distribution
1) Class Limit (CL)
Corresponding to a class interval, the class limits may be defined as the minimum
value and the maximum value the class interval may contain. The minimum value is
known as the lower class limit (LCL) and the maximum value is known as the upper
class limit (UCL). For the frequency distribution of weights of BBA Students, the LCL
and UCL of the first class interval are 44 kgs. and 48 kgs. respectively.
2) Class Boundary (CB)
Class boundaries may be defined as the actual class limit of a class interval. For overlapping
classification or mutually exclusive classification that excludes the upper class limits like 10–
20, 20–30, 30–40, ……… etc. the class boundaries coincide with the class limits. This is usually
done for a continuous variable. However, for non-overlapping or mutually inclusive
classification that includes both the class limits like 0–9, 10–19, 20–29,…… which is usually
applicable for a discrete variable, we have

where D is the difference between the LCL of the next class interval and the UCL of the given
class interval. For the data presented in table 10.5, LCB of the first class interval
3) Mid-point or Mid-value or class mark
Corresponding to a class interval, this may be defined as the total of the two class limits or class
boundaries to be divided by 2. Thus, we have

Referring to the distribution of weight of BBA students, the mid-points for the first two class
intervals are
4) Width or size of a class interval
The width of a class interval may be defined as the difference between the UCB and
the LCB of that class interval. For the distribution of weights of BBA students, C, the
class length or width is 48.50 kgs. – 43.50 kgs. = 5 kgs. for the first class interval. For
the other class intervals also, C remains same.
5) Cumulative Frequency
The cumulative frequency corresponding to a value for a discrete variable and
corresponding to a class boundary for a continuous variable may be defined as the number
of observations less than the value or less than or equal to the class boundary. This
definition refers to the less than cumulative frequency. We can define more than cumulative
frequency in a similar manner. Both types of cumulative frequencies are shown in the
following table.
6) Frequency density of a class interval
It may be defined as the ratio of the frequency of that class interval to the corresponding
class length. The frequency densities for the first two class intervals of the frequency
distribution ofweights of BBA students are 3/5 and 4/5 i.e. 0.60 and 0.80 respectively.
7) Relative frequency and percentage frequency of a class interval
Relative frequency of a class interval may be defined as the ratio of the class frequency to
the total frequency. Percentage frequency of a class interval may be defined as the ratio of
class frequency to the total frequency, expressed as a percentage. For the last example, the
relative frequencies for the first two class intervals are 3/36 and 4/36 respectively and the
percentage frequencies are 300/36 and 400/36 respectively. It is quite obvious that
whereas the relative frequencies add up to unity, the percentage frequencies add up to one
hundred.
Graphical Representation of Frequency Distribution
1) Histogram or Area diagram;
 This is a very convenient way to represent a frequency distribution.
 It shows a set of class interval by its width and the frequency by its Height.
 This is exclusively used for showing frequency distribution of quantitative data that are
continuous in nature.
 Area of a rectangle shows the proportion of the class frequency in the total.
Note:
 In case of unequal class, First convert it in to exclusive series and then draw histogram.

 If mid point is given, then take the difference between mid points of two consecutive
frequencies and divided it by two, we well get height (h). Now, to make it continuous series
subtract h from mid point (will get lower limit) and add h to the mid point (will get upper limit)
and then construct histogram.

 Draw a kink, if histogram does not starts with zero.


2) Frequency Polygon;
 A frequency polygon is obtained from a histogram by joining the mid-points of the Top of
various rectangles with the help of straight lines.

Frequency Polygon
A frequency curve can be regarded as a limiting form of frequency polygon.
If the points, obtained in case of frequency polygon are joined with the help of a smooth curve,
we get a frequency curve
3) Ogives or cumulative Frequency graphs.
By plotting cumulative frequency against the respective class boundary, we get ogives.

Ogives or cumulative Frequency graphs

Less than type ogives More than type ogives

Cumulative Cumulative
frequencies are frequencies are
plotted with upper plotted with lower
limit of class intervals. limit of class
intervals.

Note: It is generally assumed class preceding the first class has the frequency to be zero.
Types of Frequency Curve
1) Bell-shaped curve;
 Most of the commonly used distributions provide bell-shaped curve, which, as suggested by
the name, looks almost like a bell.
 On a bell-shaped curve, the frequency, starting from a rather low value, gradually reaches
the maximum value, somewhere near the central part and then gradually decreases to
reach its lowest value at the other extremity.
 Example: Profit, Hight, Weight.

Bell-shaped Curve
2) U-shaped curve;
 A U-shaped curve, the frequency is minimum near the central part and the frequency
slowly but steadily reaches its maximum at the two extremities.

 Example: The distribution of Kolkata bound commuters belongs to this type of curve as
there are maximum number of commuters during the peak hours in the morning and in the
evening.

U-shaped Curve
3) J-shaped curve;
 J-shaped curve starts with a minimum frequency and then gradually reaches its maximum
frequency at the other extremity.

 Example: The distribution of commuters coming to Kolkata from the early morning hour to
peak morning hour follows such a distribution.

J-shaped Curve
4) Mixed curve.
 The combination of all the frequency curves is known as mixed curve.

Mixed Curve

You might also like