0% found this document useful (0 votes)
5K views55 pages

Business Statistics BBA 201-18 Notes Unit 1

This document provides information about a business statistics course offered at I.K.G. Punjab Technical University. The objectives of the course are to familiarize students with basic statistical tools for quantitative analysis and decision making. The course aims to teach concepts like measures of central tendency, measures of variation, correlation analysis, regression analysis, probability, and probability distributions. The course is divided into 4 units that will cover topics such as data collection and presentation, measures of central tendency and variation, correlation and regression analysis, and probability theory and distributions.

Uploaded by

Akshay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5K views55 pages

Business Statistics BBA 201-18 Notes Unit 1

This document provides information about a business statistics course offered at I.K.G. Punjab Technical University. The objectives of the course are to familiarize students with basic statistical tools for quantitative analysis and decision making. The course aims to teach concepts like measures of central tendency, measures of variation, correlation analysis, regression analysis, probability, and probability distributions. The course is divided into 4 units that will cover topics such as data collection and presentation, measures of central tendency and variation, correlation and regression analysis, and probability theory and distributions.

Uploaded by

Akshay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Business Statistics

Prepared by Vidhi khanna


I.K.G. Punjab Technical University
BBA Batch 2018

BBA 201-18 Business Statistics Course


Objective: Course
Objective: The objective of the course on Business Statistics is to familiarize students with the basic
statistical tools used to summarize and analyse quantitative information for decision making. Analysis
of numbers is required for taking decisions related to every aspect of business.

Course Outcomes (COs): After completion of the course, the students shall be able to:
CO1: To learn the basic concepts like statistics and calculation of arithmetic mean, median and mode
and partition values.
CO2: To understand the calculation of moments, skewness and kurtosis and determining whether the
given distribution is normal or not.
CO3: To be acquainted with prerequisite knowledge required to understand the Probability and
applications of probabilitytheory.
CO4:To understandthe concept ofcorrelation regressionanalysis andtheirapplications.
CO5: To apply the learnt techniques in statistical testing and their applications.

Unit I
IntroductiontoStatistics:Meaning, Definitions, Features ofstatistics,Importance,Functions,Scope and
Limitations ofStatistics.DataCollection:Sources ofPrimaryand Secondarydata.Presentation ofData.
Frequency distribution.

SamplingConcepts:MeaningofPopulationandSample,ParametersandStatistics,Descriptiveand
InferentialStatistics, Probability and Non-Probability Sampling Methods including Simple Random
Sample, Stratified Sampling, Systematic Sampling, Judgement Sampling and Convenience Sampling.

Unit II
Measures of Central Tendency: Mathematical averages including arithmetic mean, geometric mean
and harmonic mean, properties and applications. Positional Averages: Mode and median (and other
partition values including quartiles, deciles and percentile. Graphic presentation of measures of central
tendency.

Measures ofVariation:Absolute and relative measures.Range, quartile deviation, mean deviation,


standarddeviationand theircoefficients. Properties of Standard Deviation and Variance

Unit III
Simple Correlation Analysis: Meaning of Correlation, Simple, multiple and partial, linear and nonlinear
correlation, correlation and causation, scatter diagram, pearson’s correlation coefficient and Rank
Correlation.
Simple Regression Analysis: Meaning of Regression, Principle of least square and regression analysis,
Calculationofregressioncoefficient,propertiesofregressioncoefficient,Relationshipbetween
correlation and regressioncoefficient.

Unit IV Theory of Probability:


Meaning of Probability, Approaches to the calculation of probability, calculation of event probabilities,
Addition and Multiplication, Laws of Probability (Proof not required), Conditional Probability and Bayes’
Theorem (Proof notrequired).
Probability Distribution: Binomial Distribution: Probability Distribution function, Constants, Shape
,Fitting of Binomial Distribution, Poisson Distribution: Probability Function (including Poisson
approximation to binomial distribution) Constants, Fitting of Poisson Distribution, Normal Distribution:
Probability Distribution Function, Properties of Normal Curve, Calculation of Probabilities.

Unit – 1

Chapter – 1
Nature and Scope of statistics

Introduction

The term statistics is ultimately derived from the Latin word Status, an Italian word Statista, a
German word Statistik. All these words have the literal meaning as ‘political state’. For the decades,
the word statistics was associated with the display of facts and figures pertaining to economics,
demographic and political situations prevailing in the country. With the passage of time the old concept
of subject statistics has changed and now it is much more important issue to resolve and problems
to solve.

Definition
A.L. Bowley defines, “Statistics may be called the science of counting”. At another place he
defines, “Statistics may be called the science of averages”. Both these definitions are narrow
andthrow light only on one aspect of Statistics.
Many a time counting is not possible and estimates are required to be made. Therefore, Boddington
defines it as “the science of estimates and probabilities”.

InthewordsofCroxton&Cowden, “Statistics may bedefined as the collection,presentation, analysis


and interpretation of numerical data”.

Meanings of Statistics
The word statistics has three different meanings, which are discussed below:
(1) Plural Sense (2) Singular Sense (3) Plural of the Word “Statistic”

(1) Plural Sense


In the plural sense, the word statistics refers to numerical facts and figures collected in a
systematic manner with a definite purpose in any field of study. In this sense, statistics are also
aggregates of facts
which are expressed in numerical form. For example, statistics on industrial production, statistics
or population growth of a country in different years, etc.

(2) Singular Sense


In a singular sense, it refers to the science comprising of methods which are used in the collection,
analysis,interpretationandpresentationofnumericaldata.Thesemethodsareusedtodraw
conclusion about populationparameters.

It means the science of statistics or subject itself.


(3) Plural of the Word “Statistic”:
Thewordstatisticsisusedasthepluraloftheword“statistic,”whichreferstoanumericalquantity
like the mean, median, variance, etc., calculated from sample values.

For example: Ifweselect15studentsfromaclassof80students,measuretheirheightsandfind


the average height, this average would be a statistic.
Characteristics/Feature of statistics in the plural form

1. Statistics are aggregate of facts


By aggregate we means a set of figures. Single, isolated and unconnected figures, even if these are
expressed in numerical terms, cannot be termed as statistics. For example, the age of a student 20 years,
theproductionofafirminayearisnotstatistics.Similarly,asinglefigurerelatingtoproduction,sales,
birth, death etc., would not be statistics although aggregates of such figures would be statistics
because of their comparability and relationship.

2. Statistics are affected to a marked extent by a multiplicity of causes


It means statistics are aggregate of facts which are not affected by on factor only but by a variety
of factors or circumstances. For instance, the wheat production affected by climate, soil fertility,
rainfall, quality of seeds, methods of cultivation etc. All these factors acting jointly determine the
amount of production of wheat and it is not easy to access the individual contribution of any one of
thesefactors.

3. Statistics are numerically expressed


The fact will be called statistics only if they are expressed in quantitative i.e., in numerical form.
Qualitative approach such as poor, rich, intelligent, beautiful etc. do not constitute statistical data.
Likewise,Iamtallandmyfriendisshortthatdoesnotmakeanystatisticalsense.However,ifmyheight
is 5’-6’’ and that of my friend is 5’-1’’, It would be taken as statistical information.
4. Statistics are enumerated or estimated according to reasonable standard of accuracy
There are two methods of collection of data: (a) The census method or enumeration method and (b)
Sampling or estimating method. Census method is to be followed when the field of enquiry is not
too vast. In this method all the units in population are studied. The data collected by this method
will be reliableandaccurate.Onthecontraryifthefieldofenquiryisveryvastthenonlyalternativeleft
for the collection of data is sampling or estimation technique. Obviously, the estimated figures are not
perfectly
accurate, reliable and error free. Our efforts should be to minimize these errors to the greatest possible
extend and bring it to the reasonable standard of accuracy.
5. Statistics should be collected in a systematic manner for a predetermined purpose
For systematic collection of data, a suitable plan of data collection should be prepared and work
should be done accordingly. Data collected in haphazard manner might leads to wrong conclusions.
6. Statistics should be capable of being placed in relation to each other
The collected figure should be comparable and well-connected in the same department of inquiry. Only
homogenous data qualifies to be fit for the purpose of comparison. For example, it would be
meaningless to compare height of an animal with that of a tree.

Characteristics and Feature of Statistics in Singular Sense


1. Collection of Data
It is the first step in any statistical investigation. Collection of numerical data should be carried out
with utmost care because they form the foundation of statistical analysis. If the data are faulty,
the conclusions drawn can never be reliable. Data collected for the first time and used specially
for the particular problem or the issue under study are called Primary Data. On the other hand, data
that have already been collected for some purpose other than the current study are called
Secondary data.
2. Organisation of Data
Data collected from published source are generally in organised form but the primary data need
organisation. The first stage in organisation is editing of data. Editing is done to remove error,
irregularitiesinconsistencies,inaccuraciespresentinthedata.Thenextstageistheclassificationofdata.
The edited data are arranged into certain classes or groups according to similar characteristics of data.
The final stage is tabulation of data in which data are arranged in rows and columns for absolute
clarity in organisation ofdata.
3. Presentation of Data
After the collection and organisation, the data are presented through graphs, diagrams, depending upon
the purpose of enquiry. Orderly presented data facilitates the analysis of data.
4. Analysis of Data
Analysis of data forms the major part of the subject of statistics. In this stage presented data is
analysed with the help of various statistical methods. The following are important statistical
methods:
a. Measures of CentralTendency
b. Measure of Dispersion
c. Measure of Skewness
d. MeasureofKurtosis
e. Correlation Analysis
f. Regression Analysis
g. Time Series analysis and Forecasting
h. Index Number
i. Interpolation and Extrapolation
j. Theory of Probability

5. Interpretation of Data
The last stage in statistical enquiry is the interpretation of data. It means drawing conclusion from
data collected and analysed. Interpretation of data is a difficult task and requires a high degree of
skills, experience and judgement. If the results of analysed data are not properly interpreted then
wrong conclusion may be drawn and entire statistical investigations will be wasted.

Functions of Statistics
The functions of statistics may be enumerated as follows:
To present Complex data in simplified and proper form
One of the most important function of statistics is to remove the unwanted complexities in the data
and express them in a single and understandable form with the help of statistical techniques.
Figures are often boring and confusing. Presentation of data using graphs, diagrams etc. reduce the
complex nature of data and make it more understandable and comparable.
To present facts in a definite form
Another important objective of statistics is to present the data in a definite and unambiguous
form. Without a statistical study our ideas are likely to be vague, indefinite and hazy, but figures
help as to represent things in their true perspective. For example, the statement that the pass
percentage of a college is 93% is more logical, definite than simply saying that the pass percentage
is very good and excellent.
Helpful in comparison
The significance of certain figures can be better appreciated when they are compared with others of
the same type. The comparison between two different groups is best represented by certain
statistical methods, such as average, coefficients, rates, ratios, etc. For example, Per capita income of
Punjab will be considered significant only when it is compared with the per capita income of other
states.

Statistics enlarge Individual knowledge and experience


Statistics has indeed enlargedthe human knowledge by analysing and understanding the complex facts.
Use of statistical methods improve the power of reasoning in human beings and change the
mind towards rational thinking.
To establish relationship between different facts
In addition to comparison, statistics establish relationship between various quantitative and
qualitative aspects. It can best measure the change as well as direction of relationship between
two and more variables such as price and demand, investment and output, agriculture production and
use of chemical fertilizers etc.

To help classification of facts


Classification is a technique to divide the large volume of data into subgroups on homogeneous basis.
This division helps in comparing, presenting and analysing the heterogeneous facts in a proper manner.

Formulation of policy through planning


At all levels whether it is public sector, government department, or individual establishment, making the
policies is an integral part. After comparison and establishing relationships between various variables,
statistics paves the way to frame suitable policies in different fields particularly in economics,
commerce and management, industry, agriculture and business.

Testing of hypothesis
Statistical techniques are extremely helpful in formulating and testing of hypothesis in different fields
of knowledge. E.g., the hypothesis that new drug is effective or not in controlling blood pressure will
require the help of statistical hypothesis testing to reach at the conclusions. Moreover, the testing of
hypothesis also leads to development of new theories.

Helpful in predictions
Forecasting and predictions are integral part of the functions of statistics. All statistical techniques
lead towards forecasting of future trends. The use of extrapolation, time series analysis, regression
analysis etc. provide us significant base for predicting future events.

Helpful in realising the magnitude of problem


Statistics help in realisation of magnitude and seriousness of a problem and situation. For example, if
we say inflation is increasing in India, it does not show the seriousness of problem. But if we say that
inflation is increasing at the rate of 7 %, it reveals the true nature of the problem.
Importance of Statistics
These days statistical methods are applicable everywhere. There is no field of work in which
statistical methods are not applied. According to A L. Bowley, ‘A knowledge of statistics is like a
knowledge of foreign languages or of Algebra, it may prove of use at any time under any
circumstances”. The importance of the statistical science is increasing in almost all spheres of
knowledge, e g., astronomy, biology, meteorology, demography, economics and mathematics.
Economic planning without statistics is bound to be baseless. Statistics serve in administration, and
facilitate the work of formulation of new policies. Financial institutions and investors utilise
statistical data to summaries the past experience. Statistics are also helpful to an auditor, when he
uses sampling techniques or test checking to audit the accounts of his client
1. Quantitativeexpressionofeconomicproblems:Statisticsisanessentialtoolforaneconomistto
understand the problems of an economy through quantitative data. Example: The problem of poverty
in India can be quantitatively expressed as there is a substantial decline in poverty ratios in India from
55% in 1973 to 36% in 1993.
2. Inter-sectoral and inter-temporal comparisons: This quantitative data is further used to make
intersectoral comparison, i.e., across different sectors of the economy and inter-temporal
comparisons, i.e., over different plan periods of the rural and urban unemployment.
3. Causeandeffectrelationship:Differentsetsofdataareusedtofindthecause-and-effect
relationship. This enables policy makers to formulate policy to solve the problem of an economy.
4. Economic equilibrium: Statistical data helps economists to understand the behaviour of the
producer and consumer in the market. Example: How the producer chooses the combination of
inputs to produce the goods to maximise their profit.
5. Developing economic theories: This also facilitates economists to developtheories as how the
prices of goods vary in relation to the demand for the product.
6. Forecasting: Statistical data are useful to forecast the changes in the factors which influence
other factors. This information enables economists to formulate polices and suggestions to
overcome the problem.
7. Formulation of policies: Statistical data is essential for formulating policies of economic
development. Example: If the government wants to formulate or modify labour laws, then it will
require statistical data on working conditions, number of working hours and minimum wages
received by workers.

Scope of Statistics
Itisoftensaidthat“Statisticians is what Statisticians do”.Thisexplainsthescopeofstatisticsas
very wide and extends extensively to the fields of Economics, Commerce, Trade, Industry,
Agriculture Physical sciences, bio-sciences, astronomy, psychology and many more.

The word Statistics has been derived from the various words:
Latin word‘Status’
Italian word ‘Statista’
German word‘Statistik’
All these words mean ‘political state’ or ‘stateman’s art’. This means statistics has a deep relation
with the state. In ancient times the statistics is considered as science of kings or science of
statecraft andis associated with:
Collection of data concerning
population Number of deaths and
births
Area under land cultivation
Number of crimes
Military strength and allied tasks.
But at present times, statistics is used by almost all ministries or government departments like Finance
Ministry, Agriculture Ministry, Defence Ministry, Industrial Ministry and many more.
EXAMPLE: Useofstatisticsforframingthepolicyofpovertyalleviation,populationcontrol,
inflation control, money supplyetc.
STATISTICS AND ECONOMICS
Scope ofStatistics include the area of Economics also.The statisticaldataandmethods ofstatistics
are used for understandingthe
Economic theories
Economic problems
Economic policy
formulation Economic
planning Budgeting
National income
accounting Foreign trade,
etc.
Statistics is helpful in understanding the intensity of the economic problem and deriving the solutions
for the same on the basis of the data available.
EXAMPLE
IndexNumbersareusedtostudyPriceofcommodities,Volumeofproduction,importsandexports etc.
Time series analysis is used to study the behaviour of prices, production and consumption of
commodities, money in circulation and sales of the firms.
STATISTICS AND ECONOMETRICS
Econometrics is one of the most recent fields of study concerning economics. It combines the
methods andtechniquesofStatisticsandMathematicstobuildmodelsfortheanalysisofeconomic
problems and then provides solution to all those problems.
EXAMPLE:InEconometricsLinearRegressionmodelsareformulatedtoanalysetheeffectofvarious
determinants of demand on demand for a product.
STATISTICS AND NATURAL SCEINCES
Natural sciences are biology, Zoology, medicine, meteorology etc. the statistics are helpful in the field
of physical sciences as the experiments conducted in this field are based upon the data collected
with the help of descriptive statistics. Statistics is used both for analyzing data and drawing
conclusions.
STATISTICS AND SOCIAL SCEINCES
The statisticalmethods are also useful inthe field ofHistory,Sociology, Education, Psychologyetc.
Various researches in this field are done with the help of statistics.
EXAMPLE: In the field of politics, Statistics is used to evaluate the effects of the policies of the
government.
InHistory,therecordofallthepasteventsismaintainedwiththehelpofthedescriptivestatistics.
STATISTICS ANDRESEARCH
TheScopeofstatisticsalsoincludestheareaofresearch.Anykindofresearchworkisincomplete
withouttheuseofStatistics.Itplays animportant role inthe empiricalinvestigationofthe laws and
theoriesofvariousbranchesofstudy.Ithelpsinverifyingthepracticalvalidityofthelawsand
theories of various branches of study.
STATISTICS AND PLANNING
The scope ofstatistics alsoexpands to the area of Planning. Planning is always done onthe basis of past
recordsordata.Inthisway,statisticsbecomesabaseforplanningandforecastingaboutthefuture.
Planning is the order of the day and without statistics, planning is inconceivable”.
The saying of the Tippet rightly marked that the statistics and planning are indispensable.
STATISTICS AND INDUSTRIAL MANAGEMENT
ThescopeofstatisticsalsoextendstothefieldofIndustry.StatisticalQualityControlisabranchof
Statistics that deals with ‘Quality Control’ of manufactured goods. With the help of probability theory
and sampling technique, control charts and inspection plans are formulated.
EXAMPLE:ISO(InternationalStandardOrganisation)hasstipulatedcertainstatisticalprocedurestobe
followed to get an ISO series certificates which have increased the scope of statistics.
STATISTICS AND BANKING
Statistics is used in the field of banking for various activities like
Fixing rate ofinterests
Advancing of loans
Making recoveries
Establishment of new branches etc.

STATISTICS AND INSURANCE


Theoryofprobabilityisthetoolofstatisticsthatisusedintheinsurancesectorforthecalculationof
premium rates. Also, life expectancy tables are prepared with the help of the statistics.
STATISTICS AND COMMERCE
Statisticsisalsohelpfulineachandeveryactivitythatiscoveredundertheframeworkof
the commerce. It is used in the business activities such as:
Production Management
Demand forecasting
Financial Analysis
Costing
Marketing
Manpower Planning
MarketResearchetc.
CONCLUSION
Apart from the above-mentioned areas, the scope of statistics also extends to the fields of
Chemistry, Sociology, engineering, accounting, auditing etc. It is instrumental in enhancing human
welfare and as such is a master key that enables us to solve the problems of mankind almost in
every field.

LIMITATIONS OF STATISTICS
Statistics totally ignores qualitative aspects of facts
The statistical methods don’t study the nature of phenomenon which cannot be expressed
in quantitative terms. Such phenomena cannot be a part of the study of statistics. These
include health, riches, intelligence etc. It needs conversion of qualitative data into
quantitative data.
Statistics deals with aggregates only
It is clear from the definition given by Prof. Horace Sacrist, “By statistics we mean aggregates
offacts….andplacedinrelationtoeachother”,thatstatisticsdealswithonlyaggregatesof
facts or items and it does not recognize any individual item. Thus, individual terms as
death of 6 persons in an accident, 85% results of a class of a school in a particular year,
will not amount to statistics as they are not placed in a group of similar items. It does not
deal with the individual items, however, important they may be.
Statistics results cannot be always dependable
Too much dependence on statistical findings will prove to be an act of foolishness because
it is the science of estimates and probabilities.
Statistics do not establish cause and effect relationship
Statistics merely determine the correlation between variables. No means are available to
distinguish between cause and effect.
Statistics is only one of the methods of studying a given problem
Thestatisticalmethodcannotprovidethebestsolutioninallcircumstancesforthegiven
problem. Unless they are supplemented by other evidence, statistics cannot be very useful
for examining a given problem, such as a culture, religion, or philosophy.
Chapter - 2
Collection

Collection of ofData
data
In the view of layman data means information. The word data is plural of Latin ‘datum’ which means
an individual fact or a piece of information. In statistics data means mass of information collected
from different sources. The collection of data is an important task in statistical investigation or
enquiry. Collection of data is process of collecting information keeping in view the purpose of
investigation. The data form the basis for statistical enquiry therefore utmost care should be taken
while collecting the data otherwise it leads to wrong conclusions and faulty decisions.
Census and Sampling Techniques of Collection of Data
There are two important techniques of Data collection, (i) Census enquiry implies
complete enumerationof eachunit oftheuniverse,(ii) Ina samplesurvey,onlya small part of
the group, is considered, which is taken as representative. For example, the population
census in India implies the counting of each and every human being within the country. In
practice sometimes it is not possible to examine every item in the population. Also, many a
time it is possible to obtain sufficiently accurate results by studying only a part of the
“population”. For example, if the marks obtainedinstatisticsby 10 studentsinanexamination
are selected at random, say out of 100, then the average marks obtained by 10 students will be
reasonably representative of the average marks obtained by all the 100 students. In such a
case, the populations will be the marks of the entire group of 100 students and that of 10
students will be a sample.

Nature of data
It may be noted that different types of data can be collected for different purposes. The data can
be collectedinconnectionwithtimeorgeographicallocationorinconnectionwithtimeandlocation.
The following are the three types of data:
1. Time series data.
2. Spatial data
3. Spacio-temporal data
Time series data
Itisacollectionofasetofnumericalvalues,collectedoveraperiodoftime.Thedatamighthave been
collected either at regular intervals of time or irregular intervals of time.
Spatial Data:
If the data collected is connected with that of a place, then it is termed as spatial data. For example

Spacio Temporal Data:


If the data collected is connected to the time as well as place then it is known as spacio temporal data.

Types of data

1. Primary data
2. Secondary data
Primary data
By primary data we mean the data which is collected for the first time mainly for the
problem under study. Such data are first-hand information and original in character. It is
called primary because it is collected from the original source by the investigator himself for
his own purpose. The original compiler of the data is the primary source. For example, the
office of the Registrar General will be the primary source of the decennial population
census figures.
Choice between Primary and Secondary Data
An investigator has to decide whether he will collect fresh (primary) data or he will compile
data from the published sources. The former is reliable but the latter can be relied upon
only by examining the following factors: —
(i) source from which they have been obtained;
(ii) their true significance;
(iii) completeness and
(iv) method tocollection.
Inadditiontotheabovefactors,thereareotherfactorstobeconsideredwhilemaking
choice between the primary or secondary data:
(i) Nature and scope of enquiry.
(ii) Availability of time and money.
(iii) Degree of accuracy required and
Generally, for conducting statistical studies primary data are preferred by government agencies and
institutions because it is original and more accurate but individual researchers prefer to use secondary
data as there are convenient and readily available.
Difference between primary and secondary
data

Methods of Collection of Primary Data


The primary methods of collection of statistical information are the following:
1. Direct Personal interview,
2. Indirect oral interview,
3. Schedules to be filled in by informants
4. Questionnaires in charge of enumerators
5. Information from Correspondents, and
6. Collection of data using electronic media
7. Observation method
Theparticularmethodthatisdecidedtobeadoptedwoulddependuponthenatureandavailabilityof
time, money and other facilities available to the investigation.
1. Direct personal interviews:
The persons from whom information’s are collected are known as informants. The investigator
personally meets them and asks questions to gather the necessary information. It is the suitable
method for intensive rather than extensive field surveys. It suits best for intensive study of the
limited field.
2. Indirect oral interview
Under this method the investigator contacts witnesses or neighbours or friends or some other
third parties who are capable of supplying the necessary information. This method is preferred if the
required information is on addiction or cause of fire or theft or murder etc., If a fire has broken out a
certainplace, thepersons livinginneighbourhoodandwitnesses arelikelytogiveinformationonthe
causeoffire.In some cases, police interrogated third parties who are supposed to have knowledge of a
theft or a murder and get some clues. Enquiry committees appointed by governments generally adopt
this method and get people’ s views and all possible details of facts relating to the enquiry. This method
is suitable whenever direct sources do not exist or cannot be relied upon or would be unwilling to part
with the information. The validity of the results depends upon a few factors, such as the nature of the
person whose evidence is being recorded, the ability of the interviewer to draw out information
from the third 32 parties by means of appropriate questions and cross examinations, and the number
of persons interviewed.For the success of this method one person or one group alone should not be
relied upon.
3. Information from correspondents:
The investigator appoints local agents or correspondents in different places and compiles the
information sent by them. Information’s toNewspapers and some departments of Government come by
this method. The advantage of this method is that it is cheap and appropriate for extensive
investigations. But it may not ensure accurate results because the correspondents are likely to be
negligent, prejudiced and biased. This method is adopted in those cases where information is to be
collected periodically from a wide area for a longtime.
4. Mailed questionnaire method:
Under this method a list of questions is prepared and is sent to all the informants by post. The list
of questions is technically called questionnaire. A covering letter accompanying the questionnaire
explains
the purpose of the investigation and the importance of correct information and request the
informants to fill in the blank spaces provided and to return the form within a specified time.
This method is appropriate in those cases where the informants are literates and are spread over a
wide area.
5. Schedules sent through Enumerators:
Under this method enumerators or interviewers take the schedules, meet the informants and filling
their replies. Often distinction is made between the schedule and a questionnaire. A schedule is filled
by the interviewers in a face-to-face situation with the informant. A questionnaire is filled by the
informant which he receives and returns by post. It is suitable for extensive surveys.

6. Collection of data using electronic media


With the advent of scientific approach in all walks of life the old technique of data collection has
virtually become out dated. An extensive use of television, telephone, mobile phone, fax, etc. has
given new dimensions to the modes of collecting statistical information. The use of computer
technology has totally eliminatedthechanceofanyerror,biasorprejudicedattitudeonthepartofany
individualoragency.

7. Observation method
In this method information is obtained by investigator’s own observation without asking the
respondents.Herecordsthebehaviouras it occurs,ofaneventinwhichheis interested. Sometimes
electronicandmechanicaldevices are usedto recordthe desireddata.Observationmethod is usedwhen
the study relates to behavioural science. This method is planned systematically. It is subject to
many controls and checks.
Questionnaire

A questionnaire is a list of questions directly or indirectly connected with the work of the enquiry.
The answers to these questions would provide all the information sought. The questionnaire is
put in the charge of trained investigators whose duty is to go to all persons or selected persons
connected with the enquiry. This method is usually adopted in case of large inquiries. The method
of collecting data is relatively cheap and also the information obtained is that of good quality.
The main drawback of this method is that the enumerator (i.e., investigator in charge of the
questionnaire) maybeabiasedoneandmaynotentertheanswergivenbytheinformation.Where
there are many enumerators, they may interpret various terms in questionnaire according to
their whims. To that extent the information supplied may be either inaccurate or inadequate or
not comparable. This drawback can be removed to a great extent by training the investigators
before the enquiry begins. The meaning of different questions may be explained to them so that
they do not interpret them according to their whims.
Drafting the Questionnaire
The success of questionnaire method of collecting information depends on the proper drafting of
the questionnaire. It is a highly specialized job and requires great deal of skill and experience.
However, the following general principle may be helpful in framing a questionnaire:
1. The questions must be arranged in a logical order so that a naturaland spontaneous
reply to each is induced.
2. The questions should be short, simple and easy to understand and theyshould convey
one meaning.
3. As far as possible, quotation of a personaland pecuniary nature should not be asked.
4. As far as possible the questions should be such that they can be answeredbriefly in ‘Yes’
or ‘No’, or in terms of numbers, place, date, etc.
5. The questionnaire should provide necessary instructions to the Informants. For instance,
if there is a question on weight. It should be specified as towhether weight is to be
indicated in lbs. or kilograms.
6. Questions should be objective type and capable of tabulation.

Sources of Secondary Data


The data which have already been collected by some agency or individual or department for some purpose and
organised by statistical order is known as secondary data. The secondary data is second hand information in
finished form and ready to analyse and interpret.

There are number of sources from which secondary data may be obtained. They may be
classified as follow.:
1. Published sources, and
2. Unpublished sources.
1. Published Sources
The various sources of published data are:
1. Reports and official publications of-
(a) International bodies such as the International Monetary Fund, International
Finance Corporation, and United Nations Organisation.
(b) Central and State Governments- such as the Report of the Patel Committee,etc.
2. SemiOfficialPublication.VariouslocalbodiessuchasMunicipalCorporation,and
Districts Boards.
3. Private Publication of—
(a) TradeandprofessionalbodiessuchastheFederationofIndia,ChamberofCommerce
and Institute of Chartered Accountants of India.
(b) Financial and Economic Journals such as “Commerce”, ‘Capital’ etc.
(c) Annual Reports of Joint Stock Companies.
(d) Publication brought out by research agendas, research scholars, etc.
4. international Bodies:

(a) United nations organization(UNO)


(b) World Health Organization(WHO)
(c) International labor Organization(ILO)
(d) International bank for reconstruction and development (IBRD)
(e) World meteorological organization(WMO)
(f)International Money Fund(IMF)
Clinical and other personal records, death certificates, published mortality statistics, census publications, etc.
Examples include:

1. Official publications of Central Statistical Authority


2. Publication of Ministry of Health and Other Ministries
3. News Papers andJournals.
4. International Publications like Publications by WHO, World Bank, UNICEF
5. Records of hospitals or any Health Institutions

2. Unpublished Sources
There are various sources of unpublished data such as records maintained by various government and private
offices,studies made byresearchinstitutions, scholars,tradeassociations researchstudies by institutions
etc., such source can also be used where necessary.Biographies,autobiographies,
diariesetcarealso example of unpublisheddata.
Chapter – 3
Construction of frequency distribution and Presentation of
data

What is frequency distribution


Collected and classified data are presented in a form of frequency distribution. Frequency
distribution is simply a table in which the data are grouped into classes on the basis of
common characteristics and the number of cases which fall in each class are recorded. It
shows the frequency of occurrence of different values of a single variable. A frequency
distribution is constructed to satisfy three objectives:
(i) to facilitate the analysis of data,
(ii) toestimatefrequenciesoftheunknownpopulationdistributionfromthedistribution
of sample data, and
(iii) to facilitate the computation of various statistical
measures. Frequency distribution can be of two types:
Univariate Frequency Distribution.
Bivariate Frequency Distribution.
In this lesson, we shall understand the Univariate frequency distribution. Univariate distribution
incorporates different values of one variable only whereas the Bivariate frequency distribution
incorporates the values of two variables. The Univariate frequency distribution is further
classified into three categories:
Series of individual observations,
Discrete frequency distribution, and
Continuous frequency distribution.
Presentation of data
Chapter – 4
Sampling Concepts

Sampling Concepts: Meaning of Population and Sample, Parameters and Statistics,


DescriptiveandInferentialStatistics,ProbabilityandNon-ProbabilitySamplingMethods
including Simple Random Sample, Stratified Sampling, Systematic Sampling, Judgement
Sampling and Convenience Sampling.

Population or Universe
Population or Universe is the aggregate of all the items to be studied for statistical enquiry.
The population or universe represents the entire group of units which is the focus of
the study. Thus, the population could consist of all the persons in the country, or those
in a particular geographical location, or a special ethnic or economic group, depending
on the purpose and coverage of the study. A population could also consist on non-human
units such as farms, houses or business establishments.
For example – Suppose there are 250 workers in a factory and a researcher collects
information about income and expenditure from all the workers, then 250 workers would
be taken as population or universe. The individual units of the population are called ‘member’
or ‘elements’ or ‘items’.
A Universe or population is of following two types:
Finite Population
A population is called finite if it is possible to count its individuals. It may also be called
a countable population. The number of vehicles crossing a bridge every day, the number
of births per years and the number of words in a book are finite populations. The
number of units in a finite population is denoted by N, Thus N is the size of the
population.
Infinite Population
Sometimes it is not possible to count the units contained in the population. Such a
population is called infinite or uncountable. Let us suppose that we want to examine
whether a coin is fair or not. We shall toss it a very large number of times to observe the
number of heads. All the tosses will make an infinite or uncountable infinite population.
The number of germs in the body of a sick patient is perhaps something which is
uncountable.

Real Universe

By real universe or population, we mean a universe in which items or elements actually exist,
whether in physical, numerical or logical form e.g., the number of scholarship holders, the
number of baskets containing oranges etc. Real universe can be finite or infinite.

Hypothetical universe
Hypothetical universe contains items which do not exist. It is also known as
‘Theoretical Universe’. In statistical investigation hypothetical universe has no role to
play.

Census or Complete enumeration method


Census or enumeration refers to the study of all the items or observations in the
population or universe. Here entire population is investigated or studied. This method is
also known as complete count.
Under the census or complete enumeration method, the statistician collects the data for
each and every unit of the population or universe. This universe is a complete set of items
whichare of interest in any situation. To give you an example, if you record the marks of all
students of B.Com of the Mumbai University for analysis, it is a census investigation. The
population census is another example of a census investigation. Usually, this method is
recommended in cases where the area of investigation is limited and requires intensive
examinationofthepopulation.

Merits of a Census Investigation


Now that the census definition is clear, let’s look at the merits of a census investigation.

Intensive Study – Under census investigation, you must obtain data from each
and every unit of the population. Further, it enables the statistician to study more
than one aspect of all items of the population. To give you an example, the
Indian Government conducts a census investigation once every 10 years. The
authorities collect the data regarding the population size, males, and
females, education levels, sources of income, religion, etc.

Reliable Data – The data that a statistician collects through a census


investigation is more reliable, representative, and accurate. This is because, in
a census, the statistician observes every item personally.

Suitable Choice –Itisagreatchoiceinsituationswherethedifferentitems ofthe


population are nothomogeneous.

The basis of various surveys – Data from a census investigation is used as a


basis in various surveys.
Free from Bias- Census investigation is free from any form of bias and
prejudice. This is so because investigator has to study each and every unit of
population.
Heterogeneous population- When the units of population are not same i.e.,
they are heterogeneous in nature then census method is most suitable
approach for collection of data.
Demerits of a Census Investigation
A census investigation also has certain demerits. Some of these demerits are:

Costs – Since the statistician closely observes each and every item of the
populationbeforecollectingthedata,itmakesacensus investigationaverycostly
method of investigation. Usually, government organizations adopt this method
to collect detailed data like the population census or agricultural census or the
census of industrial protection, etc.

Time-consuming – A census investigation is time-consuming andalso


requires manpower to collect original data.

Possibilities of Errors – There are many possibilities of errors in the


census investigation method due to non-response, measurement, lack of
preciseness of thedefinitionofstatisticalunits oreventhepersonalbias ofthe
investigators.

Sample Survey
By ‘sample’ we mean only a part of the population or universe. Instead of investigation all the
items in a population, some specific items are selected from the population for study. The
set of these items is called sample. On the basis of the sample study conclusions are drawn
about the characteristics of population under study.
Sample survey is the technique to study the universe on the basis of a sample. A sample is a
finite subset of the population. It represents a particular characteristic of the entire
population. For example, a doctor examines a few drops of blood to test the blood of
a patient,wetest the qualityofa bag of rice bytesting afewgrains out of it.Here fewdrops
of blood or few grains of rice constitute the samples. The number of units in a sample is
called “the sample size”. Thus sample survey is a method of analysing the
characteristics of the entire population by studying a representative part of it.

Merits:
Very often sampling method is preferred to census method of collecting data,
because of the followingreasons.
i) Thesample methodinvolves less cost thanthe census method.Because hereonly a
part of the population is examined. So, it is economical.
ii) Sample study saves time and provides quick result.
iii) Samplingmethodoftenprovidesmoreaccurateinformationthanthecensus
method.Becauseherewesurveyonlyafewitemsofthepopulation.Samplingis
generally done by trained and experienced persons. It facilitates intensive study and
getting detailed information about the population.
iv) To get approximate or aggregate results sampling is generally preferred to census
method.
v) In case of large population, sample method is more suitable than census method
for collecting information.
Demerits:
Sample method also has a number of drawbacks. Some of the important drawbacks of
this method are given below.
i) Ifitisaquestionofdeliberateselection,theresultmaybeverymuchbiased.Thisshall
mislead the enquiry.
ii) Allcharacteristicsofthepopulationmaynotbefoundinthesamplesdrawnfrom
the population.
iii) Information from sampling method is relatively less accurate than that from
census method.
iv) Samplesurvey needs properplanning andexecution bytrainedpersonnel.
Otherwise, it may give wrong results.

Parameter vs Statistic
A parameter is a number describing a whole population (e.g., population mean),
while a statistic is a number describing a sample (e.g., sample mean).
The goal of quantitative research is to understand characteristics of populations by
finding parameters. In practice, it’s often too difficult, time-consuming or unfeasible to
collect data from every member of a population. Instead, data is collected from
samples.
Examples of Parameters
20% of U.S. senators voted for a specific measure. Since there are only 100 senators, you
can count what each of them voted.
Examples of Statistic
50% of people living in the U.S. agree with the latest health care proposal. Researchers can’t
ask hundreds of millions of people if they agree, so they take samples or part of the
population and calculate the rest.
Parameter vs. Statistic: The Differences
The difference between a parameter vs a statistic is that a parameter is a fixed
measure describingthe wholepopulation,whileastatisticis acharacteristic ofasample,
aportionof the target population.
Aparameterisafixed,unknownnumericalvalue,whilethestatisticisaknownnumberand
a variable which depends on the portion of the population.
Sample statistic and population parameters have different statistical notations:
In population parameter, population proportion is represented by P, mean is represented by
µ (Greek letter mu), σ2 represents variance, N represents population size, σ (Greek letter
sigma) represents standarddeviation.
In sample statistics, mean is represented by x̄ (x-bar), sample proportion is represented
by p̂ (p-hat), s represents standard deviation, the sample size is represented by n, sx̄
represents Standard error of the mean.
Descriptive Statistics

Descriptive statistics summarizes or describes the characteristics of a data set.


Descriptive statistics consists of two basic categories of measures: measures of central
tendency and measures of variability (or spread). Measures of central tendency describe
the center of a data set. Measures of variability or spread describe the dispersion of data
withintheset.

Descriptive statistics are brief descriptive coefficients that summarize a given data set,
which can be either a representation of the entire population or a sample of a
population. Descriptive statistics are broken down into measures of central tendency and
measures of variability (spread). Measures of central tendency include the mean,
median, and mode, while measures of variability include standard deviation, variance,
minimum and maximum variables, kurtosis, andskewness.

Descriptive statistics, in short, help describe and understand the features of a specific
data set by giving short summaries about the sample and measures of the data. The
most recognized types of descriptive statistics are measures of center: the mean,
median, and mode, which are used at almost all levels of math and statistics. The
mean, or the average, is calculated by adding all the figures within the data set and then
dividing by the number of figures within the set.

For example,thesum of thefollowing data set is 20:(2, 3,4, 5, 6).The meanis 4 (20/5).The
mode ofa dataset is the value appearing most often,andthe median is the figure situated
in the middle of the data set. It is the figure separating the higher figures from the
lower figures within a data set. However, there are fewer common types of descriptive
statistics that are stillvery important.
People use descriptive statistics to repurpose hard-to-understand quantitative
insights across a large data set into bite-sized descriptions. A student's grade point
average (GPA), for example, provides a good understanding of descriptive statistics. The
idea of a GPA is that it takes data points from a wide range of exams, classes, and grades,
and averages them together to provide a general understanding of a student's overall
academic performance. A student's personal GPA reflects their mean academic
performance.

Inferential Statistics

In inferential statistics predictions are made by taking any group of data in which you
are interested. It can be defined as a random sample of data taken from a
population to describe and make inference about the population.
With inferential statistics, you are trying to reach conclusions that extend beyond
the immediate data alone. For instance, we use inferential statistics to try to infer
from the sample data what the population might think. Or, we use inferential
statistics to make judgments of the probability that an observed difference between
groups is a dependable one or one that might have happened by chance in this study.
Thus, we use inferential statistics to make inferences from our data to more general
conditions; we use descriptive statistics simply to describe what’s going on in our
data.
Techniques or methods of sampling

Simple Random Sampling

It is also known as Chance sampling, or probability sampling. Random sample is


the one in which all units of the population has had an equal chance of being
included. Random sampling is one of the simplest forms of collecting data from the
total population. Under random sampling, each member of the subset carries an
equal opportunity of being chosen as a part of the sampling process.
Forexample,thetotalworkforceinorganizationsis300andtoconductasurvey,asample
group of 30 employees is selected to do the survey. In this case, the population is the total
number of employees in the company and the sample group of 30 employees is the
sample.Eachmemberoftheworkforcehasanequalopportunityofbeingchosenbecause
allthe employees which were chosento be part of the survey were selected randomly.
But, there is always a possibility that the group or the sample does not represent the
populationasawhole,inthatcase,anyrandomvariationistermedasasamplingerror.

Applications
Lottery methods
Advantages
Minimum sampling bias as the samples are collected
randomly Selection of samples is simple as random
generators are used The results can be generalized due to
representativeness
Disadvantages
The potential availability of all respondents can be costly and time
consuming Larger sample sizes

Systematic sampling
In systematic random sampling, the researcher first randomly picks the first item from
the population. Then, the researcher will select each nth item from the list. The
procedure involved in systematic random sampling is very easy and can be done manually.
The results are representative of the population unless certain characteristics of the
population are repeated for every nth individual.
Systematic sampling involves selection of sample units at equal intervals, after all
the units in the population are arranged in some systematic order such as
alphabetical, chronological, geographical, order etc. Systematic sampling is also
known as ‘quasi-random sampling’.
𝑁 =𝑘
𝑛

N is
population n
is sample size
K is called sample interval

Steps in selecting a systematic random sample:


Calculate the sampling interval (the number of observations in the population divided by the
number of observations needed for the sample)
Select a random start between 1 and sampling interval

Repeatedly add sampling interval to select subsequent households


Ex:Ifasampleof20needstobecollectedfromapopulationof100.Dividethepopulation
into 20 groups with a members of (100/20) = 5. Select a random number from the first
group and get every 5th member from the random number.
Applications
Quality Control: The systematic sampling is extensively used in manufacturing industries for
statistical quality control of their products. Here a sample is obtained by taking an item
from the current production stream at regular intervals.
In Auditing: In auditing the savings accounts, the most natural way to sample a list
of accounts to check compliance with accounting procedures.
Advantages
Cost and time efficient

Spreads the sample more evenly over the population


Disadvantages
Complete population should be known
Sample bias If there are periodic patterns within the dataset

Stratified Random sampling


When the population has different sectors with different characteristics i.e. the population is
divided on heterogeneous basis. The population is divided into ‘Strata’ or group on some
homogeneous purpose. Then the selection of appropriate number of items is made from each sub-
group on random basis. The sum total of all the items taken separately from each sub group or
strata will form a stratified sampling.

For example, one might divide a sample of adults into subgroups by age, like 18–29,
30–39, 40–49, 50–59, and 60 and above.
Advantages
Greater level of representation from all the groups
If there is homogeneity within strata and heterogeneity between strata, the estimates can
be as accurate
Disadvantages
Requires the knowledge of strata membership
Might take longer and more expensive
Complex methodology
Multistage sampling
Bythismethodasampleisdividedonthebasisoflargesamplinguniteachofthem
further sub divided into smaller units.
Let’sconsiderthesamplelocationastheUSA.Theresearchgoalistoassesstheonline
spending trends of people in the US through an online questionnaire. Researchers can form
their sample group comprising 200 households in the following manner:

1. Firstly, choose the number of states using simple random sampling (or any
other probability sampling). For example, select ten states.
2. Secondly, choosefivedistricts withineach stateusingthe systematic
sampling method (or any other probability sampling).
3. Thirdly, choose four households from each district using the systematic sampling
or simple random sampling method. You will end up with 200 houses that you can
include in the sample group for research.

Cluster Sampling
Cluster sampling is similar to stratified sampling. In the cluster sampling the universe
is divided into number of relatively small subdivisions or clusters and then some of
these clusters are randomly selected for inclusion in the overall sample.
Ex: A researcher wants to conduct an academic performance of engineering students
under a particular university. He can divide the entire population into multiple
engineering colleges (Which are clusters) and randomly pick up some clusters for the
study.

Types of cluster sampling:


One-stage cluster: From the above example, selecting the entire students from the random
engineering colleges is one stage cluster
Two-Stage Cluster: From the same example, picking up the random students from the each
cluster by random or systematic sampling is Two-Stage Cluster
Advantages
Saves time and money
Itisveryeasytousefromthepractical
standpoint Larger sample sizes can be used
Disadvantages
High sampling error
Difference between Stratified random sampling and Cluster sampling

Non-Probability Sampling
Non-Probability samples are preferred when accuracy in the results is not
important. These are inexpensive, easy to run and no frame is required. If a non-
probability sample is carried out carefully, then the bias in the results can be
reduced.
The main disadvantage of non-Probability sampling is “dangerous to make inferences
about the whole population.”

Judgement Sampling

In judgement sampling, selection of sample units depends on the discretion or


judgement of the investigator. The investigator chooses the units from the
universe according to his own judgement.
In Judgement (or Purposive) Sampling, a researcher relies on his or her judgment when choosing
members of the population to participate in the study. Researchers often believe that they can obtain
arepresentativesamplebyusingsoundjudgment,whichwillresultinsaving timeandmoney.

As the researcher’s knowledge is instrumental in creating a sample in this sampling technique, there
are chances that the results obtained will be highly accurate with a minimum margin of error.

Ex: A broadcasting company wants to research one of the TV shows. The researcher has an idea of the
target audience and he can choose the members of the population to participate in the study.

Advantages
Cost and time effective sampling method

Allows researchers to approach their target market directly

Almost real-time results

Disadvantages
Vulnerability to errors in judgment by researcher

Low level of reliability and high levels of bias

Inability to generalize research findings

Snowball sampling
This method is commonly used in social sciences when investigating hard-to-reach groups.
Existing subjects are asked to nominate further subjects known to them, so the
sample increases in size like a rolling snowball. For example, when surveying risk behaviors
amongst intravenous drug users, participants may be asked to nominate other users to be
interviewed.
This sampling method involves primary data sources nominating other potential primary
data sources to be used in the research. So the snowball sampling method is based on
referrals from initial subjects to generate additional subjects. Therefore, when applying
this sampling method members of the sample group are recruited via chain referral.
Extensive Sampling
Extensive sampling may refer either to a case where a wide variety of topics are covered
superficially, rather than a few topics in detail or a large area is surveyed broadly, rather than
a small area studied in detail.
By this method a very large sample is selected and items from which it is difficult to
collect any information are dropped out.
Quota sampling
Investigators set definite quotas according to some specific features of population like
social classes, age groups, religion etc. The quota confirms the total number of items in the
sample taken as awhole.
Thismethodismainlyusedbymarketresearchers.Theresearchersdividethesurvey
population into mutually exclusive subgroups. These subgroups are selected with respect to
certain known features, traits, or interests. Samples from each subgroup are selected by the
researcher.
Quota sampling can be divided into two groups-
Controlled quota sampling involves introduction of certain restrictions in order to
limit researcher’s choice of samples.
Uncontrolled quota sampling resembles convenience sampling method in a way that
researcher is free to choose sample group members
Convenience sampling
Under convenience sampling, the researcher includes only those individuals who are most
accessible and available to participate in the study.
In this method, the selection of the sample is based on convenience of the investigator rather
than judgement orprobability.
For example, standing at a mall or a grocery store and asking people to answer
questions would be an example of a convenience sample.

You might also like