DMBA103 Gumbo
DMBA103 Gumbo
1. Mr. Vijay, a retired government servant, is considering investing his money in two
proposals. He wants to choose the one that has higher average net present value and
lower standard deviations. The relevant data are given below. Can you help him
choosing the proposal?
Answer:
Proposal A
PV Probability NPV*Probability NPV– 5485 = (B) (B)(B)*
Probability
(A)
59 .3 467.7 -3926 4624042.8
5485 18252674
Proposal B
5485 73127579
Decision: NPV of both the proposals are same. But Standard deviation of Proposal A is
less. So Proposal A must be chosen.
Q2
A manufacturing firm produces steel pipes in three plants with daily production
volumes of 500, 1000 and 2000 units respectively. According to past experience, it is
known that the fraction of defectives output produced by the three plants are
respectively 0.005, 0.008 and 0.010. If a pipe is selected from a day’s total production
and found to be defective, find out (a) from which plant the pipe comes (b) what is the
probability that it comes from first plant?
Answer:
a) As plant 3 is having substantial production it can be said that if pipe is selected from
days total production and found defective will be from plant 3.
b) Probability of first plant is 8.2%
Non-Probability Sampling
Depending upon the object of enquiry and other considerations a predetermined number of
sample units is selected purposely so that they represent the true characteristics of the
population.
A serious drawback of this sampling design is that it is highly subjective in nature. The
selection of sample units depends entirely upon the personal convenience, biases,
prejudices and beliefs of the investigator. This method will be more successful if the
investigator is thoroughly skilled and
experienced.
Cluster Sampling
The total population is divided into recognisable sub-divisions, known as clusters such that
within each cluster, units are more heterogeneous and between clusters they are
homogenous. The units are selected from each cluster by suitable sampling techniques.
Secondary data
Any information, that is used for the current investigation but is obtained from some data,
which has been collected and used by some other agency or person in a separate
investigation, or survey, is known as secondary data. They are available in a published or
unpublished form.
SET-II
4. A report says that 80% of India’s females aged 15-59 are not currently engaged in
the workforce. A national agency has an opinion that this percentage may be even
more. To validate its opinion, the agency did a survey of randomly chosen 1200
females of the age group 15-59 from the different parts of the country and found 228
females working. Do the figures of the survey help the agency in validation of its
opinion?
Answer:
The survey shows that around 19% of females are working that matches the expectation of
80% not working women in age of 15-59.
But this doesn't helps in real analysis that is the validation of 80% non-workingwomen.
The real problem here lies in the age group chosen as the 15-21 the most people
irrespective of gender are not working anyhow in INDIA. And also the analysis lacks
information whether women not working temporarily is due to pregnancy.
Also general as per analysis men and women both in age 55-59 are not working due to
health reasons or not having financial requirements.
The analysis needs to be done in groups for example- age 21-30, 31-40 , 41-50 and 51-59.
So this will help more in logical analysis that would help to find out real number of women
that are not working.
And also 1200 population is really not enough to represent the total population of the
country. The analysts must try to get more number of people for survey so that they can be
analysed.
Also the location of the records matters, because due to mind set of various places , the
attitude for women working is not supportive enough.
Like the ratio in metropolitan cities would be much more than, ones in rural ones like
punjab, bihar, etc.
These qualitative factors affects how to people choose to run their households.
Moreover distinction can be made between women who are educated and the ones who are
not. Even if uneducated women are doing odd jobs, they have very less chance to be
reported.
Therefore we can conclude that figures of the survey doesn't helps in validation of opinion
of the agency
Year 2011 2012 2013 2014 2015 2016 2017 2018 2019
Sale 2.3 5.3 5.1 3.5 3.4 2.7 2.8 4.1 2.9
Answer:
Regression analysis mind set is most mathematically minded method is usually why people
shy away from it. This technique is meant for those companies that need in-depth, granular,
or quantitative knowledge of what might be impacting sales and how it can be changed in
one direction or the other, as necessary.
The regression model equation might be as simple as Y = a + bX in which case the Y is
your Sales, the ‘a’ is the intercept and the ‘b’ is the slope. You would need regression
software to run an effective analysis. You are trying to find the best fit in order to uncover
the relationship between these variables
Regression Analysis is an analytical process whose end goal is to understand the inter-
relationships in the data and find as much useful information as possible. According to the
book, there are a number of steps which are loosely detailed below.
1. Problem definition
The very first step is to, of course, define the problem we are trying to solve. Perhaps a
business question that needs to be answered or simply a prediction we want to make based
on some set of data. In this stage we must know target variable and the attributes we
presume affects the target variable. This would be later analysed to judge its credibility.
2. Analyse Data
The key is to have visual representations of our data so we can better understand the ‘inter-
relationships’ of the variables and likely so, the book I was referring to earlier, highly
recommends using visual tools to make the EDA(Exploratory Data Analysis) process easier.
Finding correlation is an important step as it allows us to roughly pick the attributes that
have a relation with the response variable. We are most likely to pick the attributes/variables
that show a positive correlation with respect to the target variable.
3. Model Selection
Based on the data , we are to pick a suitable model or regression equation. You may be
familiar with many such models like Linear Regression, Support Vector Machine, Random
Forest etc. The task in this step is to pick one that we assume will express the relationships
of our data in the best way possible. This assumption can be later accepted or refuted based
on analysis after fitting the model.
4. Model Fitting
5. Model evaluation
Final step is model evaluation — measuring and criticising exactly how good is the model
fitting the data points. We run the model on the test data and check to see how accurately it
was able to predicit the output values.
X Y XY X*X
4 3.5 14 12.25
5 3.4 17 11.56
❑
X= ∑
❑
X / n = 45/9 = 5
Y 32.1
❑
Y =∑ n = 9 =3.567 ¿
❑
¿
❑
∑ XY −n X Y
153.9 – 9(5)(3.567)
B= ❑
❑ = = .0235
123.55−9 ( 5 ) ( 5 )
∑
❑
x∗x−n XX
6. Four observers determine the moisture content of samples of a powder, each man
taking a sample from each of six consignments. Their assessments are given below:
observation consignments
1 2 3 4 5 6
1 9 10 9 10 11 11
2 12 11 9 11 10 10
3 11 10 10 12 11 10
4 12 11 11 14 12 10
Discuss whether there is any significant difference between consignments. (Useful
data: Ftab(5,15):2.96, Ftab(3,15):3.29)
Answer:
Four observers determine the moisture content of samples of a powder, each man taking a sample from
each of six consignments. Their assessments are givenbelow:
Consignments
1 2 3 4 5 6
1 9 10 9 10 11 11
2 12 11 9 11 10 10
3 11 10 10 12 11 10
4 12 11 11 14 12 10
Discuss whether there is any significant difference between consignments. (Useful data: F tab(5,15):2.96,
Ftab(3,15):3.29)
Solution
P=4
Q=6
N=p*q=24
p q 2
G
TSS=∑ ∑
2
y −
ij
i=1 j=1 n
2
257
=[ 92 +122 +… … … … .+102 +102 ] −
24
=30.958
p 2
1 2 G
SSA= ∑ y i −
q i=1 n
1[ 2 2 2 2
] 2572
= 60 +63 +64 +70 −
6 24
=8.79
q 2
1 G
SSB= ∑
p j=1
2
y j−
n
2
1[ 2 257
44 + 42 +39 + 47 +44 +41 ] −
2 2 2 2 2
=
6 24
=9.708
SSE= TSA-SSA-SSB=12.46
F(3,15)=4.703 >Ftab(3,15)=3.39, we conclude 4 observers are not equal and there is significant difference in
result.