Notes of Business Statistics Unit-1 To 4 (QUESTION ANSWERS Type)
Notes of Business Statistics Unit-1 To 4 (QUESTION ANSWERS Type)
Question 1
Define statistics.
Answer:
Statistics can be defined as the collection, presentation, classification, analysis, and
interpretation of quantitative data.
Question 2
What are the stages of statistical study?
Answer:
The stages of a statistical study are:
Collection of data
Organisation of data
Presentation of data
Analysis of data
Interpretation of data
Question 3
What are the tools used, related to statistical study?
Answer:
The tools used, related to statistical study are:
Question 4
What are the scopes of statistics?
Answer:
The scopes of statistics include:
Nature of statistics
Subject matter of statistics
Limitation of statistics
Question 5
Define statistics as a singular noun.
Answer:
In the singular sense, statistics means the science of statistics or statistical methods. It
refers to the techniques or methods relating to the analysis, collection, presentation,
classification, and interpretation of quantitative data.
Question 6
Define statistics as a plural noun.
Answer:
In the plural sense, statistics is defined as the information in terms of numerical data or
numbers such as employment statistics, statistics concerning public expenditure,
population statistics, etc.
Question 7(V.IMP)
What is inferential statistics? Difference of descriptive and inferential.
Answer:
Inferential statistics refers to the methods by which conclusions are drawn relating to the
universe based on a given sample.
Question 8
What are the two components of the subject matter in statistics?
Answer:
The two components of the subject matter in statistics are:
Descriptive statistics
Inferential statistics
Question 9
What is descriptive statistics?
Answer: Descriptive statistics refers to those methods which are used for the collection,
presentation as well as analysis of data. These methods relate to such estimations as a
measurement of central tendencies, measurement of dispersion, measurement of
correlation, etc.
Question 10 (V.IMP)
Define ‘Statistics’ and give characteristics of ‘Statistics’.
Ans.:
Statistics‟ means numerical presentation of facts. Its meaning is divided into two forms
- in plural form and in singular form. In plural form, „Statistics‟ means a collection of
numerical facts or data example price statistics, agricultural statistics, production
statistics, etc. In singular form, the word means the statistical methods with the help of
which collection, analysis and interpretation of data are accomplished.
Characteristics of Statistics -
a) Aggregate of facts/data
b) Numerically expressed
c) Affected by different factors
d) Collected or estimated
e) Reasonable standard of accuracy
f) Predetermined purpose
g) Comparable
h) Systematic collection.
Therefore, the process of collecting, classifying, presenting,
analyzing and interpreting the numerical facts, comparable
for some predetermined purpose are collectively known
as “Statistics”.
Question- 11 (V.IMP)
Discuss the functions and importance/utility of Statistics.
Ans.
Statistical methods are used not only in the social, economic and political
fields but in every field of science and knowledge. Statistical analysis has
become more significant in global relations and in the age of fast developing
information technology.
According to Prof. Bowley, “The proper function of statistics is to enlarge
individual experiences”.
Following are some of the important functions of Statistics :
a) To provide numerical facts.
b) To simplify complex facts.
c) To enlarge human knowledge and experience.
d) Helps in formulation of policies.
e) To provide comparison.
f) To establish mutual relations.
g) Helps in forecasting.
h) Test the accuracy of scientific theories.
i) To study extensively and intensively.
The use of statistics has become almost essential in order to clearly understand
and solve a problem. Statistics proves to be much useful in unfamiliar fields of
application and complex situations such as :-
a) Planning
b) Administration
c) Economics
d) Trade & Commerce
e) Production management
f) Quality control
g) Helpful in inspection
h) Insurance business
i) Railways & transport Co
a) Banking Institutions
b) Speculation and Gambling
c) Underwriters and Investors
d) Politicians & social workers.
Question: 12(V.IMP)
Discuss the Scope of Statistics.
Ans:
In the old days the use of statistics was restricted to deal with the affairs of the state. But
now-a-days the scope of statistics has spread to all those areas where numerical facts are
used such as economics, business industry, medicine, physics, chemistry and numerous
other fields of knowledge.
The scope of statistics is much extensive. It can be divided into two parts –
(i) Statistical Methods such as Collection, Classification, Tabulation,
Presentation, Analysis, Interpretation and Forecasting.
(ii) Applied Statistics – It is further divided into three parts:
a) Descriptive Applied Statistics : Purpose of this analysis is
to provide descriptive information.
b) Scientific Applied Statistics : Data are collected with the
purpose of some scientific research and with the help of
these data some particular theory or principle is propounded.
c) Business Applied Statistics : Under this branch statistical
methods are used for the study, analysis and solution of
various problems in the field of business.
Question: 13(V.IMP)
State the limitation of statistics?
Ans.
Scope of statistics are very wide. In any area where problems can be
expressed in qualitative form, statistical methods can be used. But
statistics have some limitations
1. Statistics can study only numerical or quantitative aspects of a
problem.
2. Statistics deals with aggregates not with individuals.
3. Statistical results are true only on an average.
4. Statistical laws are not exact.
5. Statistics does not reveal the entire story.
6. Statistical relations do not necessarily bring out the cause and
effect relationship between phenomena.
7. Statistics is collected with a given purpose.
8. Statistics can be used only by expert
Question: 14
Questioo: 15(V.IMP)
Define different sources of data collection.
Ans:
Types of Data
A) Primary data
B) Secondary data
5. Mailed questionnaire
A) Meaning of ● Secondary data refers to the data that has already been collected by
secondary data some other person or agency and is used by us.
(1) Published Published sources mean data available in printed form. It includes the
sources following:
1. Magazines, journals, and periodicals published by various
government, semi-government, and private organisations; Data
related to birth, death, education, etc., by the government at various
levels; data regarding prices, production, etc., published by
Economic Times, Financial Express, etc.
2. Reports of various committees or commissions like reports of pay
commission report, finance commission report, etc.
Question: 16
Questions should be in
sequence. Directly
relative questions.
Test of accuracy.
No restricted questions affecting personal
whims
In our day to day life, recording information is very crucial. A piece of information
or representation of facts or ideas which can be further processed is known as data.
The weather forecast, maintenance of records, dates, time, and everything is related
to data collection.
The collected statistical data can be represented by various methods such as tables,
bar graphs, pie charts, histograms, frequency polygons, etc.
Q 19. What is the meaning of Classification? Give objectives of
Classification and essentials of an ideal classification.
Ans.: Classification is the process of arranging data into various groups, classes
and sub- classes according to some common characteristics of separating
them into different but related parts.
Main objectives of Classification :-
(i) To make the data easy and precise
(ii) To facilitate comparison
(iii) Classified facts expose the cause-effect relationship.
(iv) To arrange the data in proper and systematic way
(v) The data can be presented in a proper tabular form only.
Essentials of an Ideal Classification :-
(i) Classification should be so exhaustive and complete that every
individual unit is included in one or the other class.
(ii) Classification should be suitable according to the objectives of
investigation.
(iii) There should be stability in the basis of classification so that
comparison can be made.
(iv) The facts should be arranged in proper and systematic way.
(v) Data should be classified according to homogeneity.
(vi) It should be arithmetically accurate.
Qns20
Give Formula for determining Magnitude of Classes?
Ans.
According to Prof. A. H. Sturges, class interval can be found using the
following formula.
I = L-S
1 + 3.322 log N
Where -
Q ns 21
Ans.:
According to Blair, “Tabulation in its broad sense is an orderly arrangement
of data in columns and rows.”
Tabulation is a process of presenting the collected and classified data in
proper order and systematic way in columns and rows so that it can be
easily compared and its characteristics can be elucidated.
Objects of Tabulation :
Orderly and systematic
presentation of data. Making data
precise and stable.
To facilitate comparison.
To make the problem clear and self evident.
Qns 22
what is Diagrammatic Representation? State the importance of Diagrams.
Ans.:
Qns:23
Qns: 24
Define sampling? Give different methods of sampling.(V.imp)
Ans:
Population is the entire group that you want to draw conclusions about.
A sample is the specific group that you will collect data from. The size of the sample
is always less than the total size of the population.
2.1 Sampling
2.2 Population
A population is any complete group (i.e., people, sales territories, stores, etc.)
sharing some common set of characteristics. It can be defined as including all
people or items with the characteristic one wish to understand and draw
inferences about them.
2.4 Census
A census is an investigation of all the individual elements making up the
population—a total listing rather than a sample.
2.5 Sample
A sample design is a definite plan for obtain a sample from a given population
(Kothari, 1998). It helps to decide the number of items to be selected in the
sample i.e. the size of the sample. Purpose of sampling is to estimate an
unknown characteristic of a population. It is all about selecting a random sample
which is true representative of the population under study. The idea is to
compute a suitable value from the sample data relating to test statistic by using
the appropriate distribution. It constitutes a certain portion of the population or
universe.
b. Law of Inertia of Large Numbers – This law states that the other things
being equal
– the larger the size of the sample; the more accurate the results are likely to
be.
3. Characteristics of Sampling
There are several interesting reasons to go for sampling. These might be (1)
lower cost (2) saves time (3) better accuracy (4) much reliable (5) greater speed
of data collection (6) precision. The reasons why one must avoid sampling are
(1) lack of representative samples
2. Systematic Sampling
3. Stratified Random Sampling
a. Proportionate
b. Disproportionate
5. Multistage sampling
1. Convenience sampling
2. Judgment sampling
3. Snowball sampling
4. Quota sampling
k=
N/n
where
:
n= sample size
N= population size
k = size of selection interval
For example one wishes to take a sample of 50 from a list consisting of
10,000 purchase orders. Purchase orders for the previous fiscal year are
serialized 1 to 10,000 (N = 10,000). A sample of fifty (n = 50) purchases
orders is needed for an audit. k = 10,000/50
= 200. First sample element randomly selected from the first 200 purchase
orders. Assume the 45th purchase order was selected. Subsequent sample
elements: 245, 445, 645 . . .
Qns: 24
Ans:
Types of Error
i) Sampling error
If researchers are not careful in planning and defining the sampling process, it
can lead to faulty research findings. Sampling error is the error that occurs
because of a representative sample from the population rather than the entire
population. In statistical terminology, it’s the difference between the statistic
you measure and the parameter you would find if you took a census of the
entire population. Sample error can’t be eliminated, but it can be reduced. In
general, it works like the larger the sample, the smaller the margin of error.
Ans. :
The central tendency of a variable means a typical value around which other values
tend to concentrate; hence this value representing the central tendency of the series is
called measures of central tendency or average.
According to Clark, “Average is an attempt to find one single figure to describe whole
of figures.”
Simple Weighted
Arithmetic Mean (X) : The most popular and widely used measure of representing
the entire data by one value is known as arithmetic mean. Its value is obtained by
adding together all the items and by dividing this total by the number of items.
Arithmetic mean may be of two types :
Arithmetic Mean
It is defined as the sum of the values of all observations divided by the number of
observations.
In general, if there are N observations as X1, X2, X3, ..., XN, then the Arithmetic Mean is given by:
1) Direct Method
For example:
(Take A = 250)
(Take A = 850)
j) DISCRETE SERIES
e) Direct Method
For example:
For example:
(Take A = 200)
(Take A = 200)
k) CONTINUOUS SERIES
1) Direct Method
:
Median
Median is defined as the middle value in the data set when its elements are
arranged in a sequential order, that is, in either ascending or descending
order.
It is a positional value. Positional average determines the position of variables in the series.
For example:
l) DISCRETE SERIES
MODE
Mode is defined as the value occurring most frequently in a given series and
around which other items of the set cluster most densely.
The word mode has been derived from the French word ‘la Mode’ which signifies
the most fashionable values of a distribution, because it is repeated the highest
number of times in the series.
INDIVIDUAL SERIES
Since the value 27 occurs the maximum number of times (thrice) in the series,
hence the modal marks = 27
m)DISCRETE SERIES
There are two methods of calculating mode using grouped data:
a) Inspection or Observation method b) Grouping method
For example:
b) Grouping method
The highest frequency total in each of the six columns is identified and analysed in
the Analysis Column to determine mode. The last column will be the analysis
column and the mode will be the value against the highest tally in the analysis
column.
For example: Calculate the mode from the following data using grouping method.
Grouping
Table
Column
1
Age Column Column Column Column Column Analysis
in Frequen 2 3 4 5 6 Column
yrs. cy
10 2 -- -- -- I
20 8 10 30 -- III
28
30 20 38 III
30 II
35
40 10 III
50 5 15 I
The value 30 occurs maximum number of times (6 times) in the analysis column.
Therefore, the value of mode is 30.
n) CONTINUOUS SERIES
Step1: Find the modal class using either inspection or grouping method.
a) Inspection/ observation method : The modal class is the class with highest frequency.
For example:
By inspection method, the modal class is 15-20 since it has the highest frequency of 30.
Column Analys
1 Column Column Column Column Column is
2 3 4 5 6 Colum
Mark Frequen
s cy n
0-5 7 -- -- -- I
25 50 --
5 - 10 18 II
43
10 - 25 IIII
15 55 73
15 - 30 75 IIII
20 50
20 - 20 II
25
By grouping method, the modal class is 15-20 since it has the highest frequency
(tally) in the analysis column.
Step2: Using the modal class, mode can be calculated by using the formula:
Mo = L + |f1 – f0| X h
Where L = Lower limit of the modal class ; h = width of the modal class
f1 = frequency of the modal class
f0 = frequency of the class preceding modal class.
f2 = frequency of the class succeeding modal class.
Now, L = 15, f1 = 30 ; f0 = 25 ; f2 = 20 ; h = 5
|f1 – f0| = |30 – 25| = 5; |f1 – f2| = |30 – 20| = 10
Mo = L + |f1 – f0| X h
Qns2
Define central tendancy? Give objective, fuctions and essentials of averages.(V.Imp)
Ans:
Qns:3
Give properties of averges.
Ans:
4. The sum of squares of deviations of observations from their arithmetic mean is minimum.
Σ(X-X)2 is always minimum.
5. If arithmetic mean and number of items of two or more related groups are given, then we
can compute the combined mean using the formula given below.
Combined Mean
If we have the arithmetic mean and number of items of two groups, we can compute
combined mean of these two groups by applying the following formula:
For example:
Qns 4
Ans.:
Geometric mean is the nth root of the product of N items or values.
Qns 5
What is Harmonic Mean? In which circumstances Harmonic Mean is used. .(V.Imp)
Ans.:
Harmonic mean of a series is the reciprocal of the arithmetic mean of the reciprocal of
the values of its items.
Calculation of Harmonic Mean (H.M.) :
Individual Series
Discrete & Continuous Series
H.M. = Reciprocal ∑ Reci.X
N
H.M. = Reciprocal ∑ (Reci.X .f)
N
Harmonic Mean is used in the following cases :-
-
For determining average speed or velocity.
-
To find out average price.
-
If the item given in the question which is variable is to be kept as constant in
the answer, or vice versa, then harmonic mean will be calculated
QNS 6
What is Partition Value. Give formula for calculating different Partition Values?
Ans.:
Values of the items that divide the series into many parts are known as partition
values. A variable may be divided into four, five, eight, ten and hundred equal parts
known as Quartiles, Quintiles, Octiles, Deciles and Percentiles. The aforesaid
partition values gives an idea of the formation of the series which are used in the
calculation of dispersion and skewness.
Measures Individual & Discrete
Series
Continuous Series
Quartiles :
Q1
Size of N + 1 th item
4
q1 = (N/4) th item & Q1 = l1 + i (q1 – c)
f
Q3
Size of 3 N + 1 th item
4
q3 = 3 (N/4) th item & Q3 = l1 + i (q3 – c)
f
Quatiles :
Qn4
Size of 4 N + 1 th item
5
qn4 = 4 (N/5)th item & Q3 = l1 + i (qn4 – c)
f
Octiles :
O2
Size of 2 N + 1 th item
8
o2 = 2 (N/8)th item & O2 = l1 + i (o2 – c)
f
Deciles :
D7
Size of 7(N + 1
th item
10
d3 = 7 (N/10) th item & D3 = l1 + i (d7 – c)
f
Percentiles:
P75
Size of 75 N + 1 th item
100
775 = 75(N/100)th item & P75 = l1 + i (p75 – c)
Qns:7
Ans:
1. The average taken for a set The middle value in the The number that occurs the most in
of numbers is called a data set is called the a given list of numbers is called a
mean. Median. mode.
.(V.Imp)
1. The average taken for a set The middle value in the data The number that occurs the
of numbers is called a mean. set is called the Median. most in a given list of numbers
is called a mode.
2. Add all of the numbers Place all the given numbers It shows the frequency of
together and divide the sum in an ascending order occurrence.
by the total number of values.
3. The result is the mean or The next step is to find the We can have more than one
average score. middle number on the list. It mode or no mode at all.
is called the median.
Qns: 8
What do you mean by Dispersion. Give the meaning of Absolute Measure and
Relative Measure with example. .(V.Imp)
Ans. :
Dispersion is a measure of the extent to which the individual item vary from a central
value Dispersion is used in two senses, (i) difference between the extreme items of
the series and (ii) average of deviation of items from the mean.
Qns : 9
Explain the various methods for measuring Dispersion. Also give their merits and
demerits? .(V.Imp)
Ans.:
Following are the important methods of studying dispersion -
1) Numerical Methods:
a) Methods of limits
b) Methods of average deviation:
i)
Range
i) Quartile Deviation
ii)
Inter-quartile range ii) Mean Deviation
iii)
Percentile Range
iii) Standard Deviation
Qns: 10
Explain the various methods for measuring Dispersion. Also give their merits and
demerits?
Ans:
a) Coefficient of Range
b) Coefficient of Quartile Deviation
c) Coefficient of Mean Deviation
d) Coefficient of Variation
Range
It is the difference between the largest and smallest value of distribution.
Computation of Range
Range = L – S
Coefficient of Range = L -S/L+S
Merits of Range
1. It is simple to understand and easy to calculate.
2. It is widely used in statistical quality control.
Demerits of Range
1. It is affected by extreme values in the series.
2. It cannot be calculated in case of open end series.
3. It is not based on all items.
Merits of Q.D
1. Easy to compute1
2. Less affected by extreme values.
3. Can be computed in open ended series.
Demerits of Q.D
Mean Deviation
Mean Deviation is defined as the arithmetic average of the absolute deviations [ignoring
signs]
of various items from Mean or Median.
Individual Series
M. D = |D| /N
Discrete/Continuous Series
M.D = |fD|/f
th:
σ = ∑d2
N
Where –
d2 (X – X) 2
N No. of Items
Direct Method :
σ = ∑fd2
N
Shortcut Method :
σ = ∑dx2 ∑dx 2
NN
Where –
dx2 (X – A) 2
A Assumed Mean
Shortcut Method :
σ = ∑fdx2 ∑fdx 2
NN
Rigidly defined
Based on all observations
Takes Algebraic signs in consideration
Amenable to further Algebraic treatment
Demerits
Qns: 10
What is the best method of measuring Dispersion. Write the formula for
calculating combined S.D.
Ans.:
Standard Deviation is the best method of measuring dispersion as deviations taken
from mean and algebraic signs are not ignored and it is algebraically correct.
Question 11.(V.Imp)What is a Sampling Distribution?
Ans. 1. A sampling distribution of a statistic is a type of probability distribution created by drawing
many random samples of a given size from the same population. These distributions help to
understand how a sample statistic varies from sample to sample.
2.Sampling distributions describe the assortment of values for all manner of sample statistics.
While the sampling distribution of the mean is the most common type, they can characterize other
statistics, such as the median, standard deviation, range, correlation, and test statistics
in hypothesis tests.
3. When the parent distribution is normally distributed, its sampling distributions will also be normal
(symmetrical) and have specific properties for the central tendency and variability.
Mean Standard Deviation
Parent Distribution µ σ
Sampling Distribution µ
σ/√n
Where,
.
UNIT 3
Qns: 1
What is Correlation. State the different types and degrees of Correlation. .(V.Imp)
Ans:
If two series vary in such a way, that fluctuations in one are accompanied by the
fluctuations in the other, these variables are said to be correlated. Like rise in price of
a commodity, reduces its demand and vica-versa.
Some relationship exists between age of husband and wife, rainfall and production.
Two variables are said to be correlated if the change in one variable results in a
corresponding change in the other variable.
(ii) Linear an Non-Linear : If the amount of change in one variable tends to bear
constant ratio of change in the other variable, the correlation is said to be
linear. We get a straight line if the variables of these series are marked on
graph paper.
Correlation would be called non-linear or curvilinear if the amount of change
in one variable does not bear a constant ratio to the amount of change in the
other variable. For example if we double the amount of rainfall the
production would not necessarily be doubled.Statistical Methods
45
(iii) Simple, Partial and Multiple Correlation : When only two variables are
studied, it is called simple correlation.
If the common effect of two or more independent variables on one dependent
data series is studied, it is called multiple correlations. For example if the
study of rain, soil, temperature on potato production per acre is studied then it
is multiple correlation.
On the other hand, in partial correlation we recognize more than two
variables, but consider only two variables to be influencing each other, the
effect of other influencing variables being kept constant.
Question:1(b)(V.Imp)
What is the difference between correlation and causation. What are two things that are
highly correlated (linear relationship) but do not have a causal relationship?
For example-
Qns: 2
X 2 4 5 6 8 11
Y 18 12 10 8 7 5
Solution:
X Y X2 Y2 XY
2 18 4 324 36
4 12 16 144 48
5 10 25 100 50
6 8 36 64 48
8 7 64 49 56
11 5 121 25 55
∑X =36 ∑Y ∑ X2 =266 ∑ Y2 =706 ∑(XY)
=60 =293
Substituting the values in the above formula, we have:
r = 6 x 293 – 36 x 60
√6 x 266 - 362 √6x706 - 602
= 1758 - 2160
√1590-1296 √4236 -3600
= -402
17.32
x25.22
= -402
436.81
= -0.920
(X) 2 3 4 5 6 7 8
(Y) 4 5 6 12 9 5 4
Solution:
X Y X2 Y2 XY
2 4 4 16 8
3 5 9 25 15
4 6 16 36 24
5 12 25 144 60
6 9 36 81 54
7 5 49 25 35
8 4 64 16 32
∑X =35 ∑Y ∑ X2 =203 ∑ Y2 =343 ∑(XY)
=45 =228
r = 7 x 228 – 35 x 45
√7 x 203 - 352 √7x343 - 452
= 1596 - 1575
√1421-1225 √2401 -2025
= 21
14 x 19.39
= -402
436.81
= 0.077
2) SPEARMAN’S RANK CORRELATION COEFFICIENT.(V.Imp)
Two types of situations may happen here. One is where we are given ranks
and the other is where we are not given any ranks.
4. When ranks are given:
When actual ranks are given, we can follow the steps as: (i) compute the
difference between two ranks (R1 and R2) and denote it as ‗d‘., (ii) square the
‗d‘ and obtain ∑d2 , and (iii) substitute the values in the formula.
Rank in 1 2 3 4 5 6 7 8 9 10
Economics
Rank in 4 8 2 3 5 7 6 9 10 1
Statistics
Solution:
6 7 -1 1 = 1- 0.8
7 6 1 1 =0.2
8 9 -1 1
9 10 -1 1
10 1 9 81 Interpretation: The result indicates that
Total -- ∑d2 =132 there
is low positive correlation
Example : Find the rank correlation coefficient from the following data:
X 17 13 15 16 6 11 14 9 7 12
Y 36 46 35 24 12 18 27 22 2 8
Solution:
9 22 8 6 2 4
7 2 9 10 -1 1
12 8 6 9 -3 9 Note: correlation is
highly
44
positive
While assigning rank, if two or more items have equal values (i.e., if there
occur a tie), they may be given mid rank. Thus, if two items are on the fifth
rank, each may ranked as 5 + 6 /2
= 5.5 and the next item in the order of size would be ranked seventh. When
two or more ranks are equal, the following formula is used for computing rank
correlation.
Where, m stands for the number of equal ranks. The term, (m3 – m)
12 is to be added in the
numerator for each group of equal rank both in x and y series.
Example: Calculate the rank correlation coefficient for the following data:
X 68 64 75 50 64 80 75 40 55 64
Y 62 58 68 45 81 60 68 48 50 70
Judge 1 1 2 3 4 5 6 7 8 9 10 11 12
Judge 2 12 9 6 10 3 5 4 7 8 2 11 1
(viii) Below are given the heights of fathers (X), and those of
their sons (Y) in centimeters. Calculate Spearman‘s rank
Correlation coefficient.
Solution:
Marks in Marks in Dx Dy Dx . Dy
Economics (X) Statistics (Y)
8 84
36 51 + - -
98 91 + + +
25 60 - - +
75 68 + + +
82 62 + - -
90 86 + + +
62 58 - - +
65 53 + - -
39 47 - - +
N=9 N =9 C=6
Demerits:
2) It is not useful if long-term changes are to be considered.
3) The method does not differentiate between small and big variations.
4) It indicates the direction of change only
Qns : 3
Define Regression. Why are there two Regression Lines? Under what conditions
can there be only one Regression Line? .(V.Imp)
Ans.:
Regression literally means “return”or “go back”. In the 19th century, Francis Galton
at first used regression in his paper “Regression towards Mediocrity in Hereditary
Stature” for the study of hereditary characteristics.
Use of regression in modern times is not limited to hereditary characteristics only but
it is widely used for the study of expected dependence of one variable on the other.
Therefore, the method by which best probable values of unknown data of a variable
are calculated for the known values of the other variable is called regression.
Regression helps in forecasting, decision making and in studying two or more
variables in economic field. It also shows the direction, quality and degree of
correlation.
Regression Lines :
Regression line is that line which gives the best estimate of dependent variable for
any given value of independent variable. If we take the case of two variables X and
Y, we shall have two regression lines as the regression of X on Y and the regression of
Y on X.
Regression Line X and Y : In this formation, Y is independent and X is dependent
variable, and best expected value of X is calculated corresponding to the given value
of Y.
Regression Line Y on X : Here Y is dependent and X is independent variable, best
expected value of Y is estimated equivalent to the given value of X.
An important reason of having two regression lines is that they are drawn on least
square assumption which stipulates that the sum of squares of the deviations from
different points to that line is minimum. The deviations from the points from the line
of best fit can be measured in two ways – vertical, i.e. parallel to Y – axis, and
horizontal i.e. parallel to X axis.
For minimizing the total of the squares separately, it is essential to have two
regression lines.
Single line of Regression : When there is perfect positive or perfect negative
correlation between the two variables (r = ±1) the regression lines will coincide or
overlap and will form a single regression line in that case.
Qns:4
Ans.:
Qns :5
Ans.:
Computation of Regression Equations : Algebraic expression of regression lines is
called regression equations. Like lines equations are also two :
Regression Equation X on Y
Regression Equation Y on X
Original Form :
X = a + by
Formula :
X - X = bxy (Y - Y)
Original Form :
Y = a + bx
Formula :
Y - Y = byx (X - X)
Here -
X Mean of x series.
Y Mean of y series.
bxy Regression coefficient X on Y
Here -
X Mean of x series.
Y Mean of y series.
Q.4
Give the interpretation of Regression Coefficients. How Correlation is calculated
from Regression Coefficients?
Ans.:
Interpretation of Regression Coefficients :
(i) If both the coefficients are positive correlation coefficient will be positive, and
if both the coefficients are negative, the coefficient of correlation will also be
negative.
(iii) Both the regression coefficients will have the same sign.
(iv) The product of both the regression coefficients cannot be more than 1.
r = (bxy) X (byx)
Probability Rules
There are three main rules associated with basic probability: the addition rule, the multiplication rule, and the
complement rule. You can think of the complement rule as the 'subtraction rule' if it helps you to remember it.
1.) The Addition Rule: P(A or B) = P(A) + P(B) - P(A and B)
If A and B are mutually exclusive events, or those that cannot occur together, then the third term is 0, and the rule
reduces to P(A or B) = P(A) + P(B). For example, you can't flip a coin and have it come up both heads and tails on
one toss.
2.) The Multiplication Rule: P(A and B) = P(A) * P(B|A) or P(B) * P(A|B)
If A and B are independent events, we can reduce the formula to P(A and B) = P(A) * P(B). The term independent
refers to any event whose outcome is not affected by the outcome of another event. For instance, consider the
second of two coin flips, which still has a .50 (50%) probability of landing heads, regardless of what came up on the
first flip. What is the probability that, during the two coin flips, you come up with tails on the first flip and heads on
the second flip?
Let's perform the calculations: P = P(tails) * P(heads) = (0.5) * (0.5) = 0.25
3.) The Complement Rule: P(not A) = 1 - P(A)
Do you see why the complement rule can also be thought of as the subtraction rule? This rule builds upon the
mutually exclusive nature of P(A) and P(not A). These two events can never occur together, but one of them always
has to occur. Therefore P(A) + P(not A) = 1. For example, if the weatherman says there is a 0.3 chance of rain
tomorrow, what are the chances of no rain?
Let's do the math: P(no rain) = 1 - P(rain) = 1 - 0.3 = 0.7
Law of Total Probability
Law of Total Probability: P(A) = P(A|B) * P(B) + P(A|not B) * P(not B)
For example, what is the probability of a person's favorite color being blue if you know the following:
Q2.What is binomial distribution with example? comparison of normal distribution and poisson. .(V.V.Imp)
ANS. The binomial is a type of distribution that has two possible outcomes (the prefix “bi” means two, or twice).
For example, a coin toss has only two possible outcomes: heads or tails and taking a test could have two possible
outcomes: pass or fail. A Binomial Distribution shows either (S)uccess or (F)ailure.
comparison of normal distribution and poisson
The Poisson(λ) Distribution can be approximated with Normal when λ is large. For sufficiently large values of λ,
(say λ>1,000), the Normal(μ = λ,σ = λ) Distribution is an excellent approximation to the Poisson(λ) Distribution.
2
Events occurring don't affect the probability of another event occurring within the same period.
Formula Values:
x: Actual number of occurring successes
e: 2.71828 (e = mathematical constant
λ: Average number of successes with a specified region
Binomial Distribution
Binomial Distribution is considered the likelihood of a pass or fail outcome in a survey or experiment that is
replicated numerous times. There are only two potential outcomes for this type of distribution, like a True or False,
or Heads or Tails, for example.
The probability of an occurrence can only be determined if it's done a number of times
None of the performed trials have any effect on the probability of the following trial
Likelihood of success is the same from one trial to the following trial.
a) DIFFERENCE BETWEEN:-
b) Difference between:-
Question 3.Difference between mutually Exclusive and Exchaustive events.
Ans.
.
b)Difference between independent and dependent Events.
Ans.