0% found this document useful (0 votes)
62 views98 pages

Notes of Business Statistics Unit-1 To 4 (QUESTION ANSWERS Type)

Uploaded by

nehakalra2455
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views98 pages

Notes of Business Statistics Unit-1 To 4 (QUESTION ANSWERS Type)

Uploaded by

nehakalra2455
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 98

UNIT: 1

Question 1
Define statistics.
Answer:
Statistics can be defined as the collection, presentation, classification, analysis, and
interpretation of quantitative data.
Question 2
What are the stages of statistical study?
Answer:
The stages of a statistical study are:

 Collection of data
 Organisation of data
 Presentation of data
 Analysis of data
 Interpretation of data

Question 3
What are the tools used, related to statistical study?
Answer:
The tools used, related to statistical study are:

 Census or sample technique


 Tally bar and assembling of data
 Graphs, tables, and diagrams
 Average, percentages, regression coefficient, and correlation
 Average and the degree of relation, percentage, and relation between degree
variables

Question 4
What are the scopes of statistics?
Answer:
The scopes of statistics include:

 Nature of statistics
 Subject matter of statistics
 Limitation of statistics

Question 5
Define statistics as a singular noun.
Answer:
In the singular sense, statistics means the science of statistics or statistical methods. It
refers to the techniques or methods relating to the analysis, collection, presentation,
classification, and interpretation of quantitative data.
Question 6
Define statistics as a plural noun.
Answer:
In the plural sense, statistics is defined as the information in terms of numerical data or
numbers such as employment statistics, statistics concerning public expenditure,
population statistics, etc.
Question 7(V.IMP)
What is inferential statistics? Difference of descriptive and inferential.
Answer:
Inferential statistics refers to the methods by which conclusions are drawn relating to the
universe based on a given sample.
Question 8
What are the two components of the subject matter in statistics?
Answer:
The two components of the subject matter in statistics are:

 Descriptive statistics
 Inferential statistics

Question 9
What is descriptive statistics?
Answer: Descriptive statistics refers to those methods which are used for the collection,
presentation as well as analysis of data. These methods relate to such estimations as a
measurement of central tendencies, measurement of dispersion, measurement of
correlation, etc.
Question 10 (V.IMP)
Define ‘Statistics’ and give characteristics of ‘Statistics’.
Ans.:

Statistics‟ means numerical presentation of facts. Its meaning is divided into two forms
- in plural form and in singular form. In plural form, „Statistics‟ means a collection of
numerical facts or data example price statistics, agricultural statistics, production
statistics, etc. In singular form, the word means the statistical methods with the help of
which collection, analysis and interpretation of data are accomplished.
Characteristics of Statistics -
a) Aggregate of facts/data
b) Numerically expressed
c) Affected by different factors
d) Collected or estimated
e) Reasonable standard of accuracy
f) Predetermined purpose
g) Comparable
h) Systematic collection.
Therefore, the process of collecting, classifying, presenting,
analyzing and interpreting the numerical facts, comparable
for some predetermined purpose are collectively known
as “Statistics”.

Question- 11 (V.IMP)
Discuss the functions and importance/utility of Statistics.
Ans.
Statistical methods are used not only in the social, economic and political
fields but in every field of science and knowledge. Statistical analysis has
become more significant in global relations and in the age of fast developing
information technology.
According to Prof. Bowley, “The proper function of statistics is to enlarge
individual experiences”.
Following are some of the important functions of Statistics :
a) To provide numerical facts.
b) To simplify complex facts.
c) To enlarge human knowledge and experience.
d) Helps in formulation of policies.
e) To provide comparison.
f) To establish mutual relations.
g) Helps in forecasting.
h) Test the accuracy of scientific theories.
i) To study extensively and intensively.
The use of statistics has become almost essential in order to clearly understand
and solve a problem. Statistics proves to be much useful in unfamiliar fields of
application and complex situations such as :-
a) Planning
b) Administration
c) Economics
d) Trade & Commerce
e) Production management
f) Quality control
g) Helpful in inspection
h) Insurance business
i) Railways & transport Co
a) Banking Institutions
b) Speculation and Gambling
c) Underwriters and Investors
d) Politicians & social workers.

Question: 12(V.IMP)
Discuss the Scope of Statistics.

Ans:

In the old days the use of statistics was restricted to deal with the affairs of the state. But
now-a-days the scope of statistics has spread to all those areas where numerical facts are
used such as economics, business industry, medicine, physics, chemistry and numerous
other fields of knowledge.

The scope of statistics is much extensive. It can be divided into two parts –
(i) Statistical Methods such as Collection, Classification, Tabulation,
Presentation, Analysis, Interpretation and Forecasting.
(ii) Applied Statistics – It is further divided into three parts:
a) Descriptive Applied Statistics : Purpose of this analysis is
to provide descriptive information.
b) Scientific Applied Statistics : Data are collected with the
purpose of some scientific research and with the help of
these data some particular theory or principle is propounded.
c) Business Applied Statistics : Under this branch statistical
methods are used for the study, analysis and solution of
various problems in the field of business.

Question: 13(V.IMP)
State the limitation of statistics?
Ans.
Scope of statistics are very wide. In any area where problems can be
expressed in qualitative form, statistical methods can be used. But
statistics have some limitations
1. Statistics can study only numerical or quantitative aspects of a
problem.
2. Statistics deals with aggregates not with individuals.
3. Statistical results are true only on an average.
4. Statistical laws are not exact.
5. Statistics does not reveal the entire story.
6. Statistical relations do not necessarily bring out the cause and
effect relationship between phenomena.
7. Statistics is collected with a given purpose.
8. Statistics can be used only by expert
Question: 14

What do you mean by Collection of Data? Differentiate between Primary and


Secondary Data.
Ans.:
Collection of data is the basic activity of statistical science. It means
collection of facts and figures relating to particular phenomenon under
the study of any problem whether it is in business economics, social or
natural sciences.
Such material can be obtained directly from the individual units, called
primary sources or from the material published earlier elsewhere known as
the secondary sources.

Difference between Primary & Secondary Data


Primary Data Secondary Data

Basis nature Primary data are Data which are collected


original and are earlier by someone else,
collected for the first and which are now in
time. published or unpublished
state.

Collecting Agency These data are Secondary data were


collected by the collected earlier by some
investigator himself other person.

Post collection These data do not These have to be analyzed


alterations need alteration as and necessary changes
they
are according to have to be made to make
the
requirement of the them useful as per the
investigation requirements of investi-
gation.

Time & Money More time, energy Comparatively less


and money has to be time and
spent in collection money is to be spent.
of these data.

Questioo: 15(V.IMP)
Define different sources of data collection.
Ans:

Types of Data
A) Primary data

 Primary data means first-hand information collected by an investigator.


 It is collected for the first time.
 It is original and more reliable.
 For example, the population census conducted by the government of India
after every ten years is primary data.

B) Secondary data

 Secondary data refers to second-hand information.


 It is not originally collected and rather obtained from already published or
unpublished sources.
 For example, the address of a person taken from the telephone directory or the
phone number of a company taken from Just Dial are secondary data.

Students can also refer to Meaning and Sources of Secondary Data

Methods of Collecting Primary Data


1. Direct personal investigation
2. Indirect oral investigation

3. Information through correspondents


4. Telephonic interview

5. Mailed questionnaire

6. The questionnaire filled by enumerator

A) Meaning of ● Secondary data refers to the data that has already been collected by
secondary data some other person or agency and is used by us.

(B) Sources of secondary data can broadly be classified under two


categories:
1. Published sources
2. Unpublished sources

(1) Published Published sources mean data available in printed form. It includes the
sources following:
1. Magazines, journals, and periodicals published by various
government, semi-government, and private organisations; Data
related to birth, death, education, etc., by the government at various
levels; data regarding prices, production, etc., published by
Economic Times, Financial Express, etc.
2. Reports of various committees or commissions like reports of pay
commission report, finance commission report, etc.

3. Reports of international agencies that are regularly published by


agencies like UNO, WHO, IMF., etc.

(2) ● All the statistical material is not always published.


Unpublished
● This category includes the records maintained by various government
sources
and private offices.
● It includes the research done by scholar students or some institutions.
● Sources like reports prepared by private investigation companies can
also be used depending upon the need.

Question: 16

What do you mean by Questionnaire? Give merits of a good Questionnaire.


Ans.:

Questionnaire is a document containing questions related to the specific


requirement of a statistical investigation for collection of information
which is filled by the informants personally.
Requirements of a good questionnaire :-
Questions should be simple, clear and short.
Simple alternative or multiple
choice questions. Unambiguous and
precise.

Questions should be in
sequence. Directly
relative questions.

Test of accuracy.
No restricted questions affecting personal
whims

. Assurance of secrecy to the


informants.

Probability of a perfect answer.

Question: 17(a).Difference between population and sample. (V.IMP)

Question: 17(b).Difference between parameter and statistics. (V.IMP)


Ans.

Question: 17 c).Difference between Linear and non-linear correlation.


(V.IMP)
Ans.

Q18. What is Frequency Distribution Table in Statistics?

Ans. In statistics, a frequency distribution table is a comprehensive way of


representing the organisation of raw data of a quantitative variable. This table shows
how various values of a variable are distributed and their corresponding frequencies.
However, we can make two frequency distribution tables:

(i) Discrete frequency distribution

(ii) Continuous frequency distribution (Grouped frequency distribution)

frequency Distribution Table – Data Collection

In our day to day life, recording information is very crucial. A piece of information
or representation of facts or ideas which can be further processed is known as data.
The weather forecast, maintenance of records, dates, time, and everything is related
to data collection.

The collection, presentation, analysis, organization and interpretation of observations


or data is known as statistics. We can make predictions about the nature of data based
on the previous data using statistics. Statistics are helpful when a large amount of
data is to be studied and observed.

The collected statistical data can be represented by various methods such as tables,
bar graphs, pie charts, histograms, frequency polygons, etc.
Q 19. What is the meaning of Classification? Give objectives of
Classification and essentials of an ideal classification.
Ans.: Classification is the process of arranging data into various groups, classes
and sub- classes according to some common characteristics of separating
them into different but related parts.
Main objectives of Classification :-
(i) To make the data easy and precise
(ii) To facilitate comparison
(iii) Classified facts expose the cause-effect relationship.
(iv) To arrange the data in proper and systematic way
(v) The data can be presented in a proper tabular form only.
Essentials of an Ideal Classification :-
(i) Classification should be so exhaustive and complete that every
individual unit is included in one or the other class.
(ii) Classification should be suitable according to the objectives of
investigation.
(iii) There should be stability in the basis of classification so that
comparison can be made.
(iv) The facts should be arranged in proper and systematic way.
(v) Data should be classified according to homogeneity.
(vi) It should be arithmetically accurate.
Qns20
Give Formula for determining Magnitude of Classes?
Ans.
According to Prof. A. H. Sturges, class interval can be found using the
following formula.

I = L-S
1 + 3.322 log N
Where -

I = class interval N = No. of


observations
L= Largest value S = Smallest value

Q ns 21

Define Tabulation. State the objectives of Tabulation and kinds of Tables.

Ans.:
According to Blair, “Tabulation in its broad sense is an orderly arrangement
of data in columns and rows.”
Tabulation is a process of presenting the collected and classified data in
proper order and systematic way in columns and rows so that it can be
easily compared and its characteristics can be elucidated.
Objects of Tabulation :
Orderly and systematic
presentation of data. Making data
precise and stable.

To facilitate comparison.
To make the problem clear and self evident.

Qns 22
what is Diagrammatic Representation? State the importance of Diagrams.
Ans.:

Depicting of statistical data in the form of attractive shapes such as bars,


circles, and rectangles is called diagrammatic presentation.
A diagram is a visual form of presentation of statistical data,
highlighting their basic facts and relationship. There are geometrical
figures like lines, bars, squares, rectangles, circles, curves, etc.
Diagrams are used with great effectiveness in the presentation of all
types of data.
When properly constructed, they readily show information that might
otherwise be lost amid the details of numerical tabulation.
Importance of Diagrams :
A properly constructed diagram appeals to the eye as well as the mind
since it is practical, clear and easily understandable even by those who
are unacquainted with the methods of presentation. Utility or
importance of diagrams will become clearer from the following points
-
(i) Attractive and Effective Means of Presentation: Beautiful lines; full of
various colours and signs attract human sight, and do not strain the
mind of the observer. A common man who does not wish to indulge in
figures, get message from a well prepared diagram.
(ii) Make Data Simple and Understandable : The mass of complex data, when
prepared through diagram, can be understood easily. According to Shri
Morane, “Diagrams help us to understand the complete meaning of a
complex numerical situation at one sight only”.

Qns:23

Explain in brief the various types of Diagrams?


Ans.:
The different types of diagrams can be divided into following heads -
(1) One dimensional diagrams
(2) Two dimensional diagrams
(3) Three dimensional diagrams
(4) Pictograms
(5) Cartograms

Qns: 24
Define sampling? Give different methods of sampling.(V.imp)

Ans:
Population is the entire group that you want to draw conclusions about.

A sample is the specific group that you will collect data from. The size of the sample
is always less than the total size of the population.

In research, a population doesn’t always refer to people. It can mean a group


containing elements of anything you want to study, such as objects, events,
organizations, countries, species, organisms, etc.
1. Introduction

Most of us spontaneously undergo the process of sampling. If some of us try


some new clothes in the market which is trendy and stylish, others too in the
group assume that this might be the newest trend or fashion. The basic idea of
sampling is to draw inferences about the population by selecting some element
of population. The certain terminologies of sampling are given below

2.1 Sampling

Sampling is a statistical procedure that is concerned with the selection of certain


individual observation from the target population. It helps to make statistical
inferences about the population. Some of the basic terminologies are as follows

2.2 Population

A population is any complete group (i.e., people, sales territories, stores, etc.)
sharing some common set of characteristics. It can be defined as including all
people or items with the characteristic one wish to understand and draw
inferences about them.

2.3 Population frame

A list, map, directory, or other source used to represent the population

2.4 Census
A census is an investigation of all the individual elements making up the
population—a total listing rather than a sample.

2.5 Sample

A sample is a subset or some part of a larger population. It is “a smaller (but


hopefully representative) collection of units from a population used to
determine truths about that population” (Field, 2005).
The sample has many advantages over a census or complete enumeration. When
designed carefully, the sample may give results which are just accurate and
sometimes more accurate than those of a census and is also considerably
cheaper than the census. Hence a carefully designed sample may actually be
better than a poorly planned and executed census (Rosander).

2.6 Sample design

A sample design is a definite plan for obtain a sample from a given population
(Kothari, 1998). It helps to decide the number of items to be selected in the
sample i.e. the size of the sample. Purpose of sampling is to estimate an
unknown characteristic of a population. It is all about selecting a random sample
which is true representative of the population under study. The idea is to
compute a suitable value from the sample data relating to test statistic by using
the appropriate distribution. It constitutes a certain portion of the population or
universe.

2.7 Sampling design

Sampling design refers to the technique or procedure, the researcher undergoes


for selecting items as samples from the population or universe.
Diagrammatic representation of sampling
process
2. Basic principles of sampling

Theory of sampling is based on the following laws

a. Law of Statistical Regularity – This law comes from the mathematical


theory of probability. According to King,” Law of Statistical Regularity
says that a moderately large number of the items chosen at random from
the large group are almost sure on the average to possess the features of
the large group.” According to this law the units of the sample must be
selected at random.

b. Law of Inertia of Large Numbers – This law states that the other things
being equal
– the larger the size of the sample; the more accurate the results are likely to
be.

3. Characteristics of Sampling

There are several interesting reasons to go for sampling. These might be (1)
lower cost (2) saves time (3) better accuracy (4) much reliable (5) greater speed
of data collection (6) precision. The reasons why one must avoid sampling are
(1) lack of representative samples

(2) chances of bias (3) problems of accuracy (4) sampling errors.

4. Types of sampling (V.IMP)


A) Probability sampling

B) Non probability sampling

A) Types of Probability Sampling

1. Simple Random Sampling

2. Systematic Sampling
3. Stratified Random Sampling

a. Proportionate

b. Disproportionate

4. Cluster (or Area) Sampling

5. Multistage sampling

B. Types of non probability sampling

1. Convenience sampling

2. Judgment sampling
3. Snowball sampling

4. Quota sampling

Difference between probability sampling ( random)and non- probability


sampling(non-random).
6. Important Fact about the Term Random

The term which differentiates probability from non probability sampling is


‘random.’ In sampling the term random has entirely different meaning from its
dictionary meaning. In dictionary the term random stands for ‘without pattern’
or ‘haphazard’ while in sampling the term random selection implies the
controlled procedure where each element of the population has an equal chance
of being selection. Here the procedure is never haphazard. In fact it is
probability samples which give the precise estimate of the population under
study.

7. Types of Random Sampling(V.IMP)

1. Simple Random Sampling

It is a sampling procedure where each element in the population will


have an equal chance of being selected in the sample. This process is
simple because it requires only one stage of sample selection process. Here
we number each frame unit from 1 to N. Then use a random number table or
a random number generator to select n distinct numbers between 1 and N,
inclusively. It is easier to perform for small populations but cumbersome for
large populations.

2. Systematic Random Sampling

It is convenient and relatively easy to measure. Here an initial starting


point is selected by a random process; then every nth number on the list is
selected.
The first sample element is selected randomly from the first k population
elements. Thereafter, sample elements are selected at a constant interval, k
from the ordered sequence frame.

k=
N/n
where
:

n= sample size
N= population size
k = size of selection interval
For example one wishes to take a sample of 50 from a list consisting of
10,000 purchase orders. Purchase orders for the previous fiscal year are
serialized 1 to 10,000 (N = 10,000). A sample of fifty (n = 50) purchases
orders is needed for an audit. k = 10,000/50

= 200. First sample element randomly selected from the first 200 purchase
orders. Assume the 45th purchase order was selected. Subsequent sample
elements: 245, 445, 645 . . .

3. Stratified Random Sampling

Here the population is divided into non overlapping subpopulations called


strata. A random sample is selected from each stratum. Each stratum is then
sampled as an independent sub-population, out of which individual elements
can be randomly selected. Every unit in a stratum has same chance of being
selected.

a) Proportionate -- the percentage of the sample taken from each stratum is

proportionate to the percentage that each stratum is within the population.

b) Disproportionate -- proportions of the strata within the sample are

different than the proportions of the strata within the population.


4. Cluster Sampling

It is also called as ‘two-stage sampling’. In first stage a sample of areas is


chosen. In second stage a sample of respondents within those areas is
selected. Here population is divided into non overlapping clusters or areas of
homogeneous units usually based on geographical dispersed population.
Each cluster is a miniature, or microcosm, of the population. A subset of the
clusters is selected randomly for the sample. If the number of elements in the
subset of clusters is larger than the desired value of n, these clusters may be
subdivided to form a new set of clusters and subjected to a random selection
process.
multistage sampling (also known as multi-stage cluster sampling) is a
more complex form of cluster sampling which contains two or
more stages in sample selection. In multi-stage sampling large
clusters of population are divided into smaller clusters in several
stages in order to make primary data collection more manageable
in terms of cost effectiveness and time effectiveness. It is quite
effective in primary data collection from geographically dispersed
population where face-to-face contact is required (e.g. semi-
structured in-depth interviews).

8. Types of Non Probability Sampling(V.IMP)


1) Convenience Sampling: A type of nonprobability sampling which involves
the sample being drawn from that part of the population which is close to hand.
That is, readily available and convenient. It is also termed as grab or opportunity
sampling or accidental or haphazard sampling. Sample elements are selected for
the convenience of the researcher. The researcher using such a sample cannot
scientifically make generalizations about the total population from this
sample because it would not be representative enough. This type of sampling is
most useful for pilot testing.

2) Judgment Sampling: Here the sample elements are selected by the


judgment of the researcher. The researcher chooses the sample based on who
they think would be appropriate for the study. This is used primarily when there
are a limited number of people that have expertise in the area being researched.

3) Quota Sampling: Here the population is first segmented into mutually


exclusive sub- groups, just as in stratified sampling. Then judgment is used to
select subjects or units from each segment based on a specified proportion. In
quota sampling the selection of the sample is non-random. For example, an
interviewer may be told to sample 200 females and 300 males between the age
of 45 and 60. He might be tempted to interview those who look most helpful.

4) Snowball Sampling: survey subjects are selected based on referral from


other survey respondents. In social science research, snowball sampling is a
similar technique, where existing study subjects are used to recruit more
subjects into the sample.

Qns: 24

Define different sampling error.

Ans:

Types of Error

i) Sampling error
If researchers are not careful in planning and defining the sampling process, it
can lead to faulty research findings. Sampling error is the error that occurs
because of a representative sample from the population rather than the entire
population. In statistical terminology, it’s the difference between the statistic
you measure and the parameter you would find if you took a census of the
entire population. Sample error can’t be eliminated, but it can be reduced. In
general, it works like the larger the sample, the smaller the margin of error.

ii) Non Sampling error

This is due to poor data collection methods (like faulty instruments or


inaccurate data recording, missing data, selection bias, non response bias (where
individuals don’t want to or can’t respond to a survey), poorly conceived
concepts, vague definitions and defective questions. Increasing the sample
size will not reduce these errors. They key is to avoid making the errors in
the first place with a well-planned design for the survey or experiment.
Unit 2
Qns: 1

What do you mean by Measures of Central Tendency? Define Arithmetic Mean,


Mediam and Mode.

Ans. :
The central tendency of a variable means a typical value around which other values
tend to concentrate; hence this value representing the central tendency of the series is
called measures of central tendency or average.

According to Clark, “Average is an attempt to find one single figure to describe whole
of figures.”

Measures of Central Tendency

Arithmetic Mean Median Mode

Simple Weighted

Arithmetic Mean (X) : The most popular and widely used measure of representing
the entire data by one value is known as arithmetic mean. Its value is obtained by
adding together all the items and by dividing this total by the number of items.
Arithmetic mean may be of two types :

1 Simple Arithmetic mean

2) Weighted arithmetic mean


calculation of Mean for different series using different method

Arithmetic Mean
It is defined as the sum of the values of all observations divided by the number of
observations.
In general, if there are N observations as X1, X2, X3, ..., XN, then the Arithmetic Mean is given by:

For convenience, this will be written in simpler form:

Calculation of Mean for different series using different methods


where, ΣX = sum of all observations and N = Total number of observations.
INDIVIDUAL SERIES

1) Direct Method

where, ΣX = Sum of all observations and N = Total number of observations.

For example:

2) Assumed Mean/ Shortcut Method

where, d = (X – A) ; A = Assumed mean and N = Total number of observations.


For example:

(Take A = 250)
(Take A = 850)
j) DISCRETE SERIES

e) Direct Method

where, ΣfX = Sum of all observations multiplied by their


respective frequency and Σf = N = Total number of observations.

For example:

f) Assumed Mean / Shortcut Method


where, d = (X – A) ; A = Assumed mean and Σf = N = Total number of observations.

For example:

(Take A = 200)

g) Step Deviation Method

where, d’ = (X – A) ; and C is the common factor in d.


C
A = Assumed mean and Σf = N = Total number of observations.
For example:

(Take A = 200)

k) CONTINUOUS SERIES

1) Direct Method

:
Median
Median is defined as the middle value in the data set when its elements are
arranged in a sequential order, that is, in either ascending or descending
order.
It is a positional value. Positional average determines the position of variables in the series.

(i) INDIVIDUAL SERIES

Steps for calculating median


Step1: First arrange the data in ascending order.
Step2: Use the given formula to calculate the median.

where N = Total number of observations


For example:
When the number of observations is an even number

For example:

l) DISCRETE SERIES

Steps for calculating median


For example:
Q CONTINUOUS SERIES

Steps for calculating median


Step1: Locate the median class where (N/2)th item lies ; N= Σf.

Step2 : Using the formula given below, calculate the median.


For example:
Median = 350 + (80-75) X 50 = 350 + 8.33 = `358.33
30
The median daily wage is `358.33.

MODE
Mode is defined as the value occurring most frequently in a given series and
around which other items of the set cluster most densely.
The word mode has been derived from the French word ‘la Mode’ which signifies
the most fashionable values of a distribution, because it is repeated the highest
number of times in the series.

 INDIVIDUAL SERIES

The value which occurs maximum number of times is the mode.


For example:

Since the value 27 occurs the maximum number of times (thrice) in the series,
hence the modal marks = 27

m)DISCRETE SERIES
There are two methods of calculating mode using grouped data:
a) Inspection or Observation method b) Grouping method

a) Inspection or observation method :The value of the variable against the


highest frequency will give the mode.

For example:

b) Grouping method

The highest frequency total in each of the six columns is identified and analysed in
the Analysis Column to determine mode. The last column will be the analysis
column and the mode will be the value against the highest tally in the analysis
column.
For example: Calculate the mode from the following data using grouping method.

Grouping
Table
Column
1
Age Column Column Column Column Column Analysis
in Frequen 2 3 4 5 6 Column
yrs. cy
10 2 -- -- -- I

20 8 10 30 -- III
28
30 20 38 III
30 II
35
40 10 III
50 5 15 I
The value 30 occurs maximum number of times (6 times) in the analysis column.
Therefore, the value of mode is 30.

n) CONTINUOUS SERIES

Step1: Find the modal class using either inspection or grouping method.

a) Inspection/ observation method : The modal class is the class with highest frequency.

For example:

By inspection method, the modal class is 15-20 since it has the highest frequency of 30.

b) Grouping method (Steps same as in discrete series)


Grouping Table

Column Analys
1 Column Column Column Column Column is
2 3 4 5 6 Colum
Mark Frequen
s cy n

0-5 7 -- -- -- I
25 50 --
5 - 10 18 II
43
10 - 25 IIII
15 55 73

15 - 30 75 IIII
20 50
20 - 20 II
25
By grouping method, the modal class is 15-20 since it has the highest frequency
(tally) in the analysis column.

Step2: Using the modal class, mode can be calculated by using the formula:
Mo = L + |f1 – f0| X h

|f1 – f0| + |f1 – f2|

Where L = Lower limit of the modal class ; h = width of the modal class
f1 = frequency of the modal class
f0 = frequency of the class preceding modal class.
f2 = frequency of the class succeeding modal class.

Now, L = 15, f1 = 30 ; f0 = 25 ; f2 = 20 ; h = 5
|f1 – f0| = |30 – 25| = 5; |f1 – f2| = |30 – 20| = 10

Mo = L + |f1 – f0| X h

|f1 – f0| + |f1 – f2|

Thus, the mode is 16.67 marks.

Qns2
Define central tendancy? Give objective, fuctions and essentials of averages.(V.Imp)
Ans:

A measure of central tendency is a single value that is used to represent an


entire set of data.

Measure of central tendency is also known as an ‘Average’.


The three most commonly used measures of central tendency or ‘averages’ are:
R Arithmetic Mean
S Median
T Mode

Objectives and functions of averages


(iii) To present huge data in a summarised form: It is difficult to grasp a large
amount of data or numerical figures. Averages summarise such data into a single
figure which makes it easier to understand and remember.
(iv) To facilitate comparison: Averages are very helpful for making comparative
studies as they reduce the mass of statistical data to a single figure or estimate.
(v) To facilitate further statistical analysis: Various tools of statistical analysis
like standard deviation, correlation etc. are based on averages.
(vi) To trace precise relationship: Averages are helpful and even essential
when it comes to establishing relationships between different groups of data
or variables.
(vii) To help in decision-making: Averages provide values which act as a guideline
for decision makers. Most of the decisions to be taken in research or planning are
based on the average value of certain variables.

Essentials of a good average / measure of central tendency


1 It should be rigidly defined:
• An average should be clear and there should be only one form of interpretation.
• It should have a definite and fixed value irrespective of method of
calculations or formulae used.
2 It should be based on all observations:
• Average should be calculated by taking into consideration each and every item of the
series.
• If it is not based on all observations, it will not be representative of the whole group.
3 It should not be affected much by extreme values:
• The value of an average should not be affected much by extreme values.
• One or two very small or very large values should not unduly affect the value
of the average significantly.
4 It should be least affected by fluctuations of sampling:
• An average should possess sampling

Qns:3
Give properties of averges.

Ans:

Properties of Arithmetic Mean


 The sum of deviations of observations from their arithmetic mean is always equal
to zero.
Symbolically Σ(X – X ) = 0
When we calculate the deviations of all the items from their arithmetic mean ( X = 30),
we find
that the sum of the deviations from the arithmetic mean, i.e. Σ(X – X) comes out to be
zero.

 Arithmetic mean is NOT independent of change of origin


If each observation of a series is increased (or decreased) by a constant, then the mean of
these observations is also increased (or decreased) by that constant.

 Arithmetic mean is NOT independent of change of scale


If each observation of a series is multiplied (or divided) by a constant, then the mean of
these observations is also multiplied (or divided) by that constant.

4. The sum of squares of deviations of observations from their arithmetic mean is minimum.
Σ(X-X)2 is always minimum.

5. If arithmetic mean and number of items of two or more related groups are given, then we
can compute the combined mean using the formula given below.

Combined Mean
If we have the arithmetic mean and number of items of two groups, we can compute
combined mean of these two groups by applying the following formula:
For example:
Qns 4

What is Geometric Man? Give Algebraic Characteristics of Geometric Mean and


state when Geometric Mean is useful. .(V.Imp)

Ans.:
Geometric mean is the nth root of the product of N items or values.

Calculation of Geometric Mean (G) :


Individual Series
Discrete & Continuous Series
G = Antilog ∑ log X
N
G = Antilog ∑f log X
N
Algebraic Characteristics of Geometric Mean :
(ii)
The product of the items remains unchanged if each item is replaced by
geometric mean.
(iii) Geometric mean cannot be found if the value of some item in the series is
negative or zero.
(iv) The product of corresponding ratios on either side of the geometric mean is
always equal.
(v)
Not affected by changing the sequence of items.
Geometric mean is appropriate or useful :-
-
When ratios or percentages are to be found.

Qns 5
What is Harmonic Mean? In which circumstances Harmonic Mean is used. .(V.Imp)
Ans.:
Harmonic mean of a series is the reciprocal of the arithmetic mean of the reciprocal of
the values of its items.
Calculation of Harmonic Mean (H.M.) :
Individual Series
Discrete & Continuous Series
H.M. = Reciprocal ∑ Reci.X
N
H.M. = Reciprocal ∑ (Reci.X .f)
N
Harmonic Mean is used in the following cases :-
-
For determining average speed or velocity.
-
To find out average price.
-
If the item given in the question which is variable is to be kept as constant in
the answer, or vice versa, then harmonic mean will be calculated

QNS 6

What is Partition Value. Give formula for calculating different Partition Values?
Ans.:
Values of the items that divide the series into many parts are known as partition
values. A variable may be divided into four, five, eight, ten and hundred equal parts
known as Quartiles, Quintiles, Octiles, Deciles and Percentiles. The aforesaid
partition values gives an idea of the formation of the series which are used in the
calculation of dispersion and skewness.
Measures Individual & Discrete
Series
Continuous Series
Quartiles :
Q1
Size of N + 1 th item
4
q1 = (N/4) th item & Q1 = l1 + i (q1 – c)
f
Q3
Size of 3 N + 1 th item
4
q3 = 3 (N/4) th item & Q3 = l1 + i (q3 – c)
f
Quatiles :
Qn4
Size of 4 N + 1 th item
5
qn4 = 4 (N/5)th item & Q3 = l1 + i (qn4 – c)
f
Octiles :
O2
Size of 2 N + 1 th item
8
o2 = 2 (N/8)th item & O2 = l1 + i (o2 – c)
f
Deciles :
D7
Size of 7(N + 1
th item
10
d3 = 7 (N/10) th item & D3 = l1 + i (d7 – c)
f
Percentiles:
P75
Size of 75 N + 1 th item
100
775 = 75(N/100)th item & P75 = l1 + i (p75 – c)

Qns:7

Define relationship between Mean , Median and mode.

Ans:

Relationship between Mean, Median and Mode


In a symmetrical distribution: Mean = Median = Mode
In an asymmetrical distribution: Mode = 3 Median – 2 Mean

Sl. Mean Median Mode


No.

1. The average taken for a set The middle value in the The number that occurs the most in
of numbers is called a data set is called the a given list of numbers is called a
mean. Median. mode.

.(V.Imp)

Mean Median Mode


Sl. Mean Median Mode
No.

1. The average taken for a set The middle value in the data The number that occurs the
of numbers is called a mean. set is called the Median. most in a given list of numbers
is called a mode.

2. Add all of the numbers Place all the given numbers It shows the frequency of
together and divide the sum in an ascending order occurrence.
by the total number of values.

3. The result is the mean or The next step is to find the We can have more than one
average score. middle number on the list. It mode or no mode at all.
is called the median.

4. Example: To find the average Example: 4, 2, 8, 10, 19. Example: 3, 3, 5, 6, 7, 7, 8, 1, 1,


of the four numbers 2, 4, 6, 1, 4, 5, 6.
and 8, we need to add the  Arrange the numbers
number first. in ascending order.  Find the frequency of
i .e., 2, 4, 8, 10, 19. each number.
 2 + 4 + 6+ 8 = 20  For number 3, it’s 2. For
 As the total numbers 5, it’s 2. For 6, it’s 2. For
 Divide the sum by the
are 5, so the middle 7, it’s 2. For 8, it’s one.
total number of number 8 is the
numbers, i. e 4. For 1, it’s 3. For 4, it’s 1.
median here.
 20/4 = 5 is the average  The number with the
or mean highest frequency is the
mode. Hence, the mode
of the given sequence of
numbers is 1.

Qns: 8

What do you mean by Dispersion. Give the meaning of Absolute Measure and
Relative Measure with example. .(V.Imp)

Ans. :

Dispersion is a measure of the extent to which the individual item vary from a central
value Dispersion is used in two senses, (i) difference between the extreme items of
the series and (ii) average of deviation of items from the mean.

Absolute Measure : The figure showing the limit or magnitude of dispersion is


known as absolute measure and it is shown in the same unit as ;those of the original
data, example measures of dispersion in the age of students, their height, weight etc.

Relative Measure : For comparative study the concerning absolute measure is


divided by the corresponding mean or some other characteristic value to obtain a
ratio or percentage, which is known as the relative measure.

Qns : 9

Explain the various methods for measuring Dispersion. Also give their merits and
demerits? .(V.Imp)

Ans.:
Following are the important methods of studying dispersion -

1) Numerical Methods:
a) Methods of limits
b) Methods of average deviation:
i)
Range
i) Quartile Deviation
ii)
Inter-quartile range ii) Mean Deviation
iii)
Percentile Range
iii) Standard Deviation

(2) Graphic Method - Lorenz Curve:


i) Range: The difference between the value of the smallest item and the
value of the largest item of the series is called range.
Range = Largest item – Smallest item
Co-efficient of Range = L - S
L+S

Qns: 10

Explain the various methods for measuring Dispersion. Also give their merits and
demerits?

Ans:

Absolute Measures of Dispersion are


a) Range
b) Quartile Deviation
c) Mean Deviation
d) Standard Deviation

Relative Measures of Dispersion are

a) Coefficient of Range
b) Coefficient of Quartile Deviation
c) Coefficient of Mean Deviation
d) Coefficient of Variation

Range
It is the difference between the largest and smallest value of distribution.
Computation of Range
Range = L – S
Coefficient of Range = L -S/L+S


Merits of Range
1. It is simple to understand and easy to calculate.
2. It is widely used in statistical quality control.
Demerits of Range
1. It is affected by extreme values in the series.
2. It cannot be calculated in case of open end series.
3. It is not based on all items.

Inter quartile range and quartile deviation


Inter quartile range is the difference between Upper Quartile (Q3) and Lower Quartile Q1.
Quartile deviation is half of inter quartile range.
Computation of Inter quartile range and quartile deviation

Inter quartile Range = Q3 – Q1

Quartile Deviation Q.D = Q3  Q1 /2

Merits of Q.D

1. Easy to compute1
2. Less affected by extreme values.
3. Can be computed in open ended series.

Demerits of Q.D

1. Not based on all observations


2. It is influenced by change in sample .

Mean Deviation

Mean Deviation is defined as the arithmetic average of the absolute deviations [ignoring
signs]
of various items from Mean or Median.

Computation of Mean Deviation

Individual Series

M. D = |D| /N

Discrete/Continuous Series

M.D = |fD|/f

Merits of Mean Deviation


.
Based on all observations.

It is less affected by extreme values


Simple to understand and easy to calculate.

Demerits of Mean Deviation

It ignores ± signs in deviations.

It is difficult to compute when deviations comes in fractions.

Standard Deviation: ()

It is defined as the root mean square deviation.

Features of Standard Deviation:

Value of its deviation is taken from Arithmetic Mean.

+ and – signs of deviations taken from mean are not ignored.


Related Measures of Standard Deviation

th:
σ = ∑d2
N
Where –
d2  (X – X) 2
N No. of Items
Direct Method :
σ = ∑fd2
N
Shortcut Method :
σ = ∑dx2 ∑dx 2
NN
Where –
dx2  (X – A) 2
A Assumed Mean
Shortcut Method :
σ = ∑fdx2 ∑fdx 2
NN

Merits of Standard Deviation

Rigidly defined
Based on all observations
Takes Algebraic signs in consideration
Amenable to further Algebraic treatment

Demerits

Difficult to understand and compute. .


Affected by extreme items.

Qns: 10
What is the best method of measuring Dispersion. Write the formula for
calculating combined S.D.

Ans.:
Standard Deviation is the best method of measuring dispersion as deviations taken
from mean and algebraic signs are not ignored and it is algebraically correct.
Question 11.(V.Imp)What is a Sampling Distribution?
Ans. 1. A sampling distribution of a statistic is a type of probability distribution created by drawing
many random samples of a given size from the same population. These distributions help to
understand how a sample statistic varies from sample to sample.

2.Sampling distributions describe the assortment of values for all manner of sample statistics.
While the sampling distribution of the mean is the most common type, they can characterize other
statistics, such as the median, standard deviation, range, correlation, and test statistics
in hypothesis tests.
3. When the parent distribution is normally distributed, its sampling distributions will also be normal
(symmetrical) and have specific properties for the central tendency and variability.
Mean Standard Deviation
Parent Distribution µ σ
Sampling Distribution µ
σ/√n

Where,

o µ and σ are the


population parameters
for the mean and
standard deviation,
respectively.
o n is the sample size.

.
UNIT 3
Qns: 1
What is Correlation. State the different types and degrees of Correlation. .(V.Imp)

Ans:
If two series vary in such a way, that fluctuations in one are accompanied by the
fluctuations in the other, these variables are said to be correlated. Like rise in price of
a commodity, reduces its demand and vica-versa.

Some relationship exists between age of husband and wife, rainfall and production.
Two variables are said to be correlated if the change in one variable results in a
corresponding change in the other variable.

According to A. M. Tuttle, “Analysis of co-variation of two or more variables is


usually called correlation”.

Types of Correlation : Correlation can be of following types -

(i) Positive and Negative Correlation : If changes in two connected series is in


the same direction, i.e. increase in one variable is associated with increase in
other variable, the correlation is said to be positive. For example increase in
father‟s age, increase in son‟s age.
If the two related series change in opposite direction i.e. increase in one
variable is associated with the corresponding decrease in other variable, the
correlation is said to be negative.

(ii) Linear an Non-Linear : If the amount of change in one variable tends to bear
constant ratio of change in the other variable, the correlation is said to be
linear. We get a straight line if the variables of these series are marked on
graph paper.
Correlation would be called non-linear or curvilinear if the amount of change
in one variable does not bear a constant ratio to the amount of change in the
other variable. For example if we double the amount of rainfall the
production would not necessarily be doubled.Statistical Methods
45
(iii) Simple, Partial and Multiple Correlation : When only two variables are
studied, it is called simple correlation.
If the common effect of two or more independent variables on one dependent
data series is studied, it is called multiple correlations. For example if the
study of rain, soil, temperature on potato production per acre is studied then it
is multiple correlation.
On the other hand, in partial correlation we recognize more than two
variables, but consider only two variables to be influencing each other, the
effect of other influencing variables being kept constant.

Degree of Correlation : The interpretation of co-efficient of correlation is based on


the degree of correlation. The coefficient may be in the following degrees -
(i) Perfect Correlation :

a) Perfect positive correlation (r) = +1

b) Perfect negative correlation (r) = -1

(ii) Absence of Correlation or No Correlation :


r=o
(ii) Limited Degree of Correlation :

(a) High degree positive or negative


±0.75 to 1.00

(b) Moderate degree positive or negative


±0.25 to 0.75
(c)
Low degree positive or negative
0 to ±0.25

Question:1(b)(V.Imp)
What is the difference between correlation and causation. What are two things that are
highly correlated (linear relationship) but do not have a causal relationship?

Ans.Correlation and Causation:


Correlation refers to the relationship between 2 or more variables in a given dataset or case
study.On the other hand, causation is the fact that the change in one variable( dependent
variable) is due to the change in the other variable (independent).

For example-
Qns: 2

Explain the Mathematical Methods of finding out Correlation.


Ans.: Correlation coefficient can be determined by the following methods -

(i)Karl Pearson’s Coefficient of Correlation (r) : Karl Pearson‟s Coefficient of


Correlation is widely used in practice. It is an assumption of Karl Pearson‟s
coefficient of correlation that linear relations exist in both the series.
This method is considered as the best measure because it provides the
knowledge of directions of changes in data i.e. positive or negative, and also
shows the degree of correlation which should always lie between +1 and -1.

Properties of the Pearson’s Correlation Coefficient.(V.Imp)

 r lies between -1 and +1, or –1 ≤ r ≤ 1, or the numerical value of r

cannot exceed one (unity)


 The correlation coefficient is independent of the change of origin and scale.
 Two independent variables are uncorrelated but the converse is not true.

Example 1: calculate correlation coefficient for the following data: .(V.Imp)

X 2 4 5 6 8 11
Y 18 12 10 8 7 5

Solution:

X Y X2 Y2 XY

2 18 4 324 36
4 12 16 144 48
5 10 25 100 50
6 8 36 64 48
8 7 64 49 56
11 5 121 25 55
∑X =36 ∑Y ∑ X2 =266 ∑ Y2 =706 ∑(XY)
=60 =293
Substituting the values in the above formula, we have:
r = 6 x 293 – 36 x 60
√6 x 266 - 362 √6x706 - 602

= 1758 - 2160
√1590-1296 √4236 -3600
= -402
17.32
x25.22
= -402
436.81
= -0.920

(Note: there is high degree of negative correlation)

Example 2: Calculate correlation between X and Y

(X) 2 3 4 5 6 7 8
(Y) 4 5 6 12 9 5 4

Solution:
X Y X2 Y2 XY
2 4 4 16 8
3 5 9 25 15
4 6 16 36 24
5 12 25 144 60
6 9 36 81 54
7 5 49 25 35
8 4 64 16 32
∑X =35 ∑Y ∑ X2 =203 ∑ Y2 =343 ∑(XY)
=45 =228
r = 7 x 228 – 35 x 45
√7 x 203 - 352 √7x343 - 452

= 1596 - 1575
√1421-1225 √2401 -2025
= 21
14 x 19.39
= -402
436.81
= 0.077
2) SPEARMAN’S RANK CORRELATION COEFFICIENT.(V.Imp)

In 1904, C. Spearman introduced a new method of measuring the correlation


between two variables. Instead of taking the values of the variables he
considered the ranks (or order) of the observations and calculated Pearson‘s
coefficient of correlation for the ranks. The correlation coefficient so obtained is
called rank correlation coefficient. This measure is useful in dealing with
qualitative characteristics such as intelligence, beauty, morality, honesty etc.
The formula for spearman‘s rank correlation coefficient is:
Where, rs =spearman rank correlation coefficient
D = differences in ranks between paired items
(R1 –R2) N = number of pairs of observations

Two types of situations may happen here. One is where we are given ranks
and the other is where we are not given any ranks.
4. When ranks are given:

When actual ranks are given, we can follow the steps as: (i) compute the
difference between two ranks (R1 and R2) and denote it as ‗d‘., (ii) square the
‗d‘ and obtain ∑d2 , and (iii) substitute the values in the formula.

Example: From the following data, calculate Spearman‘s rank correlation

Rank in 1 2 3 4 5 6 7 8 9 10
Economics
Rank in 4 8 2 3 5 7 6 9 10 1
Statistics

Solution:

R1 R2 d d2 Steps for solution


1 4 -3 9
2 8 -6 36 r = 1 - 6∑d2
3 2 1 1 n(n2-1)
4 3 1 1 =1 – 6(132)
5 5 0 0 10(100-1)

6 7 -1 1 = 1- 0.8
7 6 1 1 =0.2
8 9 -1 1
9 10 -1 1
10 1 9 81 Interpretation: The result indicates that
Total -- ∑d2 =132 there
is low positive correlation

5. When ranks are not given:


In case we are given actual data, we must give them rank. We can assign ranks
by taking the largest value as one or the lowest value as one, next to it give as
two and the like.

Example : Find the rank correlation coefficient from the following data:

X 17 13 15 16 6 11 14 9 7 12
Y 36 46 35 24 12 18 27 22 2 8
Solution:

X Y Rank X RankY d d2 Solving steps


(R1) (R2) (R1 –R2)
17 36 1 2 -1 1
13 46 5 1 4 16 r = 1 - 6∑d2
15 35 3 3 0 0 n(n2-1)
16 24 2 5 -3 9 =1 – 6(44)
6 12 10 8 2 4 10(100-1)
11 18 7 7 0 0 = 1- 0.267
14 27 4 4 0 0 =0.733

9 22 8 6 2 4
7 2 9 10 -1 1
12 8 6 9 -3 9 Note: correlation is
highly
44
positive

Calculation of Spearman’s Rank correlation when equal or


repeated ranks occur

While assigning rank, if two or more items have equal values (i.e., if there
occur a tie), they may be given mid rank. Thus, if two items are on the fifth
rank, each may ranked as 5 + 6 /2

= 5.5 and the next item in the order of size would be ranked seventh. When
two or more ranks are equal, the following formula is used for computing rank
correlation.

Rank correlation coefficient = 1 – 6 (∑d2 + m3 – m)


12
n(n2-1)

Where, m stands for the number of equal ranks. The term, (m3 – m)
12 is to be added in the
numerator for each group of equal rank both in x and y series.
Example: Calculate the rank correlation coefficient for the following data:

X 68 64 75 50 64 80 75 40 55 64
Y 62 58 68 45 81 60 68 48 50 70

Merits and demerits of Rank Correlation.(V.Imp)


Merits
1. It is easy to compute and understand
2. It is highly useful when the data are of a qualitative nature like
intelligence, beauty etc.
3. When the ranks of different item-values are given, this is the only
method for finding the degree of correlation.
Demerits
R This method cannot be employed for finding out correlation in a
grouped frequency distribution.
S It is difficult to calculate rank correlation, if we have more than 30
items of observation as ranking them requires much labour.
T Compared to Pearson method, rank correlation is not precise.
X 35 36 40 38 37 39 41 40 36 38
Y 65 72 78 77 76 77 80 79 76 75
(vii) Two judges in a beauty competition rank the 12 entries
as follows. What degree of agreement is there between the judges?

Judge 1 1 2 3 4 5 6 7 8 9 10 11 12
Judge 2 12 9 6 10 3 5 4 7 8 2 11 1

(viii) Below are given the heights of fathers (X), and those of
their sons (Y) in centimeters. Calculate Spearman‘s rank
Correlation coefficient.

X 180 155 170 174 160 172 166 170 170


Y 170 165 180 180 164 169 172 170 174
3) CONCURRENT DEVIATION METHOD

The calculation of correlation coefficient by this method is based on the


direction of change or variation in the two paired variables. This is denoted by
rc and varies between +1 and -1. It is calculated by the following formula.

Example: the following are the marks obtained a group of 10 students in


Economics and Statistics. Calculate correlation by the method of
Concurrent Deviation.
EC 8 36 98 25 75 82 90 62 65 39
O
STA 84 51 91 60 68 62 86 58 53 47
T

Solution:

Marks in Marks in Dx Dy Dx . Dy
Economics (X) Statistics (Y)
8 84
36 51 + - -
98 91 + + +
25 60 - - +
75 68 + + +
82 62 + - -
90 86 + + +
62 58 - - +
65 53 + - -
39 47 - - +
N=9 N =9 C=6

r = +√+ (2C-n)/n = r = +√+ (2x6-9)/9 = ±√12 – 9/9 = ±√3/9 = 0.58


c c

Merits and Demerits of Concurrent Deviation Method


Merits:
(6) It is simple to compute
(7) It is easy to understand

Demerits:
2) It is not useful if long-term changes are to be considered.
3) The method does not differentiate between small and big variations.
4) It indicates the direction of change only
Qns : 3
Define Regression. Why are there two Regression Lines? Under what conditions
can there be only one Regression Line? .(V.Imp)

Ans.:
Regression literally means “return”or “go back”. In the 19th century, Francis Galton
at first used regression in his paper “Regression towards Mediocrity in Hereditary
Stature” for the study of hereditary characteristics.
Use of regression in modern times is not limited to hereditary characteristics only but
it is widely used for the study of expected dependence of one variable on the other.
Therefore, the method by which best probable values of unknown data of a variable
are calculated for the known values of the other variable is called regression.
Regression helps in forecasting, decision making and in studying two or more
variables in economic field. It also shows the direction, quality and degree of
correlation.

Regression Lines :
Regression line is that line which gives the best estimate of dependent variable for
any given value of independent variable. If we take the case of two variables X and
Y, we shall have two regression lines as the regression of X on Y and the regression of
Y on X.
Regression Line X and Y : In this formation, Y is independent and X is dependent
variable, and best expected value of X is calculated corresponding to the given value
of Y.
Regression Line Y on X : Here Y is dependent and X is independent variable, best
expected value of Y is estimated equivalent to the given value of X.
An important reason of having two regression lines is that they are drawn on least
square assumption which stipulates that the sum of squares of the deviations from
different points to that line is minimum. The deviations from the points from the line
of best fit can be measured in two ways – vertical, i.e. parallel to Y – axis, and
horizontal i.e. parallel to X axis.

For minimizing the total of the squares separately, it is essential to have two
regression lines.
Single line of Regression : When there is perfect positive or perfect negative
correlation between the two variables (r = ±1) the regression lines will coincide or
overlap and will form a single regression line in that case.

Qns:4

What is the difference between Regression and Correlation? .(V.Imp)

Ans.:

Difference between Correlation & Regression :

Basis Correlation Regression

Relationship Correlation tells the Using relationship


degree of average between known variables
relationship between two and unknown variables,
or more variables. the unknown variable is
estimated

Cause & Effect Unable to tell which The given independent


series is the cause and variable is the „cause‟ and
which is the effect, dependent variable is the
despite high degree of effect
correlation.

Application Limited Application Wider Application.

Qns :5

How are Regression Equations derived? Explain.

Ans.:
Computation of Regression Equations : Algebraic expression of regression lines is
called regression equations. Like lines equations are also two :

Regression Equation X on Y
Regression Equation Y on X

Original Form :
X = a + by
Formula :
X - X = bxy (Y - Y)
Original Form :
Y = a + bx
Formula :
Y - Y = byx (X - X)

Here -
X  Mean of x series.
Y  Mean of y series.
bxy Regression coefficient X on Y
Here -
X  Mean of x series.
Y  Mean of y series.

byx Regression coefficient Y on X


Computation of Regression Coefficients :
1. Direct Method :
bxy = Σxy Σy2
or
bxy = r σx σy

2. Short Cut Method

bxy = N. Σdxdy – (Σdx . Σdy)


N.Σd2y – (Σdy)2
Here - dx = X - A
dy =Y - A

2. Short Cut Method :

bxy = N. Σdxdy – (Σdx . Σdy) /


N. Σd2x – (Σdx)2
O.
Here - dx = X - A
dy =Y - A

Q.4
Give the interpretation of Regression Coefficients. How Correlation is calculated
from Regression Coefficients?

Ans.:
Interpretation of Regression Coefficients :

(i) If both the coefficients are positive correlation coefficient will be positive, and
if both the coefficients are negative, the coefficient of correlation will also be
negative.

(iii) Both the regression coefficients will have the same sign.

(iv) The product of both the regression coefficients cannot be more than 1.

Calculation of Correlation Coefficient from Regression Coefficient :

r = (bxy) X (byx)

bxy Regression coefficient x on y

byx Regression coefficient y on x.


UNIT-4

Q1. What is Probability and Explain Law of probability. .(V.Imp)


Ans. The probability that two events will both occur can never be greater than the probability that each
will occur individually. If two possible events, A and B, are independent, then the probability that both A and
B will occur is equal to the product of their individual probabilities
There are three main rules associated with basic probability: the addition rule, the multiplication rule, and
the complement rule

Probability Rules
There are three main rules associated with basic probability: the addition rule, the multiplication rule, and the
complement rule. You can think of the complement rule as the 'subtraction rule' if it helps you to remember it.
1.) The Addition Rule: P(A or B) = P(A) + P(B) - P(A and B)
If A and B are mutually exclusive events, or those that cannot occur together, then the third term is 0, and the rule
reduces to P(A or B) = P(A) + P(B). For example, you can't flip a coin and have it come up both heads and tails on
one toss.
2.) The Multiplication Rule: P(A and B) = P(A) * P(B|A) or P(B) * P(A|B)
If A and B are independent events, we can reduce the formula to P(A and B) = P(A) * P(B). The term independent
refers to any event whose outcome is not affected by the outcome of another event. For instance, consider the
second of two coin flips, which still has a .50 (50%) probability of landing heads, regardless of what came up on the
first flip. What is the probability that, during the two coin flips, you come up with tails on the first flip and heads on
the second flip?
Let's perform the calculations: P = P(tails) * P(heads) = (0.5) * (0.5) = 0.25
3.) The Complement Rule: P(not A) = 1 - P(A)
Do you see why the complement rule can also be thought of as the subtraction rule? This rule builds upon the
mutually exclusive nature of P(A) and P(not A). These two events can never occur together, but one of them always
has to occur. Therefore P(A) + P(not A) = 1. For example, if the weatherman says there is a 0.3 chance of rain
tomorrow, what are the chances of no rain?
Let's do the math: P(no rain) = 1 - P(rain) = 1 - 0.3 = 0.7
Law of Total Probability
Law of Total Probability: P(A) = P(A|B) * P(B) + P(A|not B) * P(not B)
For example, what is the probability of a person's favorite color being blue if you know the following:

 Left-handed people have blue as a favorite color 30% of the time


 Right-handed people like blue 40% of the time
 Left-handed people make up 10% of the population

Let's complete the equation:


1.) P(Blue) = P(left handed) * P(like blue|left handed) + P(not left handed) * (P(like blue|not left handed)
2.) P(Blue) = (0.1)(0.3) + (0.9)(0.4)
3.) P(Blue) = .03 + .36 = 0.39

Q2.What is binomial distribution with example? comparison of normal distribution and poisson. .(V.V.Imp)
ANS. The binomial is a type of distribution that has two possible outcomes (the prefix “bi” means two, or twice).
For example, a coin toss has only two possible outcomes: heads or tails and taking a test could have two possible
outcomes: pass or fail. A Binomial Distribution shows either (S)uccess or (F)ailure.
comparison of normal distribution and poisson

The Poisson(λ) Distribution can be approximated with Normal when λ is large. For sufficiently large values of λ,
(say λ>1,000), the Normal(μ = λ,σ = λ) Distribution is an excellent approximation to the Poisson(λ) Distribution.
2

Poisson Distribution Characteristics:


 An event can happen any amount of times throughout a period.

 Events occurring don't affect the probability of another event occurring within the same period.

 Occurrence rate is constant and doesn't change based on time.

 The likelihood of an occurring event corresponds to the time length.

Formula Values:
x: Actual number of occurring successes
e: 2.71828 (e = mathematical constant
λ: Average number of successes with a specified region
Binomial Distribution
Binomial Distribution is considered the likelihood of a pass or fail outcome in a survey or experiment that is
replicated numerous times. There are only two potential outcomes for this type of distribution, like a True or False,
or Heads or Tails, for example.

Characteristics of Binomial Distribution:

 First variable: The number of times an experiment is conducted

 Second variable: Probability of a single, particular outcome

 The probability of an occurrence can only be determined if it's done a number of times

 None of the performed trials have any effect on the probability of the following trial

 Likelihood of success is the same from one trial to the following trial.

a) DIFFERENCE BETWEEN:-

b) Difference between:-
Question 3.Difference between mutually Exclusive and Exchaustive events.

Ans.

.
b)Difference between independent and dependent Events.
Ans.

ALL THEORY NOTES ARE VERY VERY IMPORTANT AS PER PTU


LEARN AND PRACTICE BOTH ARE IMPORTANT
PRACTICE MAKES YOU PERFECT.
[
A

You might also like