0% found this document useful (0 votes)
9 views12 pages

Decision Science Assignment

Uploaded by

ashigoelg16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views12 pages

Decision Science Assignment

Uploaded by

ashigoelg16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

DECISION SCIENCE

ASSIGNMENT
ANSWER 1

INTRODUCTION:
Probability refers to the chances of occurring of a random event that takes place in an
environment. It is also defined as a proportion of favorable outcomes to all the total number of
outcomes of an event that occurred. It is denoted by symbol “P”. The sum of all the probabilities
of an event is 1. The probability of any situation lies between 0 to 1. It cannot be negative as the
favorable outcomes of an event cannot exceed the total outcomes of an event. There are various
ways in which application of probability can be used such as selecting a card from the deck of
cards, winning a lottery, tossing a coin, rolling a dice etc. Other than these, it is widely used by
companies and industry for risk evaluation, the government can forecast the weather by examine
the changes in the weather, hike in share prices can also get measured with the help of
Probability. The formula of Probability:
Probability (P) = No. of Favorable Outcomes / Total No. of Outcomes

CONCEPT:
According to the question given, BAYES’ THEOREM would be applicable.
Bayes’ Theorem refers to the possibility of situation occurring on the basis of previous
understanding of any circumstances that could be related to an event. It is an extension of
conditional law of probabilities. It is a mathematical technique which helps in calculating the
conditional probabilities is known as Bayes’ Rule. This rule is widely used in various
applications such as medicine, sports, philosophy, law, engineering etc.

The formula of Bayes’ Theorem as follows:

P (X|Y) = P (Y|X) * P (X)


P (Y)

Where,
P (X|Y) = Conditional Probability of event X given that event Y has already occurred
P (Y|X) = Conditional Probability of event Y given that event X has already occurred
P (X) = Probability of event X
P (Y) = Probability of event Y
For solving Bayes’ Rule Problems, Tree Diagram is one of the best ways to determine the
probability.

So, according to the question, it is given that:

P (B) = P (Bad Mood) = Probability of occurring Bad moods = 0.10

P (D|B) = P (Periodontal Disease | Bad Mood) = Probability of Periodontal Disease in the


presence of Bad Mood = 0.85

P (D|B’) = P (Periodontal Disease | No Bad Mood) = Probability of Periodontal Disease in the


presence of No Bad mood = 0.29

HAVING DISEASE (0.85)

BAD MOOD
(0.10)
NOT HAVING DISEASE (0.15)

EVENT
HAVING DISEASE (0.29)

HEALTHY MOOD
(0.90)
NOT HAVING DISEASE (0.71)

Tree Diagram of an Event Occurring Bad Mood and Healthy Mood

With the help of Tree Diagram, we can easily calculate the other probabilities as the summation
of probabilities is 1. The first layer of tree diagram denotes the sub division of an event i.e. Bad
Mood and Healthy Mood.
P (B) = 0.10, therefore, P (H) = 0.90
As, P (H) = P (B’) = Probability of having Healthy Mood or Probability of having No Bad
Mood.
P (H) = P (B’) = 1 – P (B)
= 1 – 0.10
= 0.90

The second layer of tree diagram further divides into two sub section of each bad mood and
healthy mood i.e. Having Disease and Not Having Disease. Subsequently, probability of not
having disease in presence of bad mood is 0.15 and probability of not having disease in the
presence of no bad mood is 0.71.

We have to find that;


P (B|D) = Probability of Bad Mood in the presence of Periodontal Disease

However, according to the BAYES’ THEOREM,

P (B|D) = P (D|B) * P (B)


P (D)

We have already given the probabilities of: P (D|B’), P (B).


But we need to find the Probability of P (D) = Probability of Periodontal Disease.

P (D) = P (D|B) * P (B) + P (D|B’) * P (B’)


= 0.85 * 0.10 + 0.29 * 0.90
= 0.085 + 0.261
= 0.346
Thus, the Probability of Periodontal Disease is 0.346.

So, according to Bayes’ Rule formula;


P (B|D) = P (D|B) * P (B)
P (D)
= 0.85 * 0.10
0.346
= 0.085
0.346
= 0.24566
= Approx. 0.2457

Thus, the Probability of having Bad Mood in the presence of Periodontal Disease is 0.2457.
Moreover, there will be 24.57% chances that he or she will have a bad mood when someone is
having periodontal disease.
CONCLUSION:
However, with the help of Tree Diagram and Bayes’ Theorem, we are able to determine that
there are 24.57% chances of having bad mood when someone is suffering from periodontal
disease. This represents the relation between the mood of people and disease from which they are
suffering.

ANSWER 2

INTRODUCTION:
A statistical approach utilized across different domains of finance, investing, and other relevant
areas where it seeks to formulate the exact nature and magnitude of relationship between
dependent and independent variable is known as REGRESSION ANALYSIS. The dependent
variable is represented by “y” and independent variable is represented by “x” and also known as
Predictor. It is a very beneficial technique that helps the organization in measuring the
consequences of independent variables on the dependent variables i.e. to which extent
independent variables affect the dependent variables. Many companies employ regression
analysis in their data to get insights about the gathered data which goes unnoticed in the past,
assist in optimizing the resources, manufacturing process and in logistics. In addition to this, it
also aids in predicting future events and resolve errors i.e. increase productivity and efficiency of
the firm.

CONCEPT:
Regression Analysis is mathematical and statistical technique that examines the relationship
between dependent variable and one or more independent or predictor variables. The analysis
can be done with different methods such as Simple Regression Analysis and Multiple Regression
Analysis.
▪ SIMPLE REGRESSION ANALYSIS refers to the analysis of that set of data which has
single dependent and independent variable.
The equation of line is: y = a + bx + u

▪ MULTIPLE REGRESSION ANALYSIS refers to the analysis of that set of data which
has a dependent variable and multiple independent variables i.e. more than 1 variable.
The equation of line is: y = a + b1x1 + b2x2 + b3x3 +……………..+ bnxn + u
Where, y = Dependent Variable
x = Independent Variable
a = Y Intercept
b = Slope Of Y
u = Error of Prediction
With the given data, it’s very hard to examine the data and make some conclusion out of it. So,
we use graphs (scatter plot or line chart) to determine the slope and intercept of the given data
and ascertain that the data is feasible or not to the company. With the help of Excel which
provides a set of tools that runs a regression on our data. With the regression model on excel, we
can easily predict the future events of the company.

However, Regression Model runs on excel of the given data is as follows:


Given:
No of Posts per Day: - Independent Variable
No of Followers: - Dependent Variable

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.046617964
R Square 0.002173235
Adjusted R Square -0.080978996
Standard Error 62.9409903
Observations 14

ANOVA
df SS MS F Significance F
Regression 1 103.538016 103.538016 0.026135613 0.874259616
Residual 12 47538.81913 3961.568261
Total 13 47642.35714

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 377.2058212 38.39613846 9.824056178 4.33703E-07 293.5478221 460.8638203
No. of Posts 1.735966736 10.73804084 0.161665127 0.874259616 -21.66021441 25.13214789
Per Day

INTERPRETATION OF EXCEL TABLE

❖ REGRESSION STATISTICS:
It represents the statistical information of the variability for the given data.

o MULTIPLE R: It is a correlation coefficient which evaluates the relationship


between no. of followers and no. of posts per day. There will be higher
relationship when there will be a higher value of Multiple R. In this case, the
value of Multiple R 0.0466 indicates the weak relationship between the variables.

o R SQUARE: It represents the percentage of variation in the no. of followers and


no of posts per day. This model shows 0.2% of the variation in no of followers
due to the no of post per day. This value does not fit to the company as it is very
low.

o ADJUSTED R SQUARE: It is the adjusted value of R Square for the no of


followers that are predicted in the model. This value is always less than the value
of R Square. Here, the value is in negative which shows that this model is not
feasible to the company records.

o STANDARD ERROR: This shows the accuracy of the regression model. It is the
difference between the actual and estimated values. In our regression model, there
is a standard error of 62.94.

o OBSERVATIONS: The sample size of the data for which regression model has to
be created.

❖ ANOVA:
ANOVA comes from Analysis of Variance. It analyzes the variables and provides the
levels of variability according to our regression model. It comprises of Regression
Statistics and Residual Output.

o F STATISTICS: This F-test helps to measure the overall importance of the


regression model. The value 0.026 indicates that the model is not a good fit as per
the data.

o SIGNIFICANCE F: It is the P-value of the data on which regression model has


been created. The ideal level of Significance of F is 0.05 i.e. 5%. But in the above
regression model, the value of Significance F is 0.874 i.e. 8.74% which is higher
than 5%. Thus, this model is not feasible for the given data.

o P VALUE: It indicates the probability of No. of Followers in relation to the No.


of Posts per Day. The P value is 0.874 i.e. 8.74% which is higher than the 5%.
This shows that it is not an enough justification to prove that there is linear
regression relationship between the no of followers and no of posts per day.
o COEFFICIENTS: Coefficient of the independent variable (no of followers) gives
the average estimated variation of the dependent variable (no of posts per day).
With the help of coefficients of intercept and no of posts per day, the regression
equation can be created.

The Regression Equation: y = a + bx


Where, y = No of Posts per Day (Independent Variable)
a = Intercept of y
b = Slope
x = No of Followers (Dependent Variable)

However, from the Regression Model, the equation of regression is:

y = 377.20 + 1.735x

CONCLUSION:
However, in conclusion, the regression model represents that the data is not feasible or not a
good fit as per the company past records. Since, the value of R Square, Adjusted R, Multiple R,
Significance of F, and F Statistics are very low accordance to the 95% of the confidence level.
Thus, it suggests that there is no significant linear relationship between the variables i.e. no of
followers and no of posts per day.

ANSWER 3 (A)

INTRODUCTION:
NORMAL DISTRIBUTION is one of the fundamental and significant continuous probability
distribution of random variables. It is also known as bell-shaped curve as it is uni-modal in
which all the values are covered under the graph. The all values which are under the bell shaped
curve refer to as the total area which is the summation of probabilities equal to 1. The normal
distribution has several features such as it is a continuous distribution; having a symmetrical
distribution about its means i.e. half of the distribution is identical to the other half of the
distribution. The curve is asymptotic to the horizontal axis that means the curve does not touch
the x-axis. In addition to this, it is a family of curves and the total area under the graph equals to
1. The Normal Distribution is determined on the basis of two values – Mean and Standard
Deviation.
CONCEPT:
A distinctive type of normal distribution in which mean is 0 and standard deviation is 1 is known
as STANDARD NORMAL DISTRIBUTION. It is also known as Z DISTRIBUTION. In case of
normal distribution, individual’s values are referred to as “x” where in case of standard normal
distribution it is referred to as “z”. A probability corresponds to the z-score reflects the
possibility that values are less than the z-score which are going to occur. Z-score can be
calculated as follows:

z = x–μ
σ

According to the question given, we need to determine the interval must be allowed between
replacements to make sure that not more than 10% should expire before replacement; this can be
done with the help of Standard Normal Distribution and z- scores.

It is given that;

No. of Light Bulbs = n = 1000


Mean life of Light Bulbs = μ = 120 days
Standard Deviation = σ = 20 days

We need to find;
Value of x = Lifetime of Light Bulbs that should expire not more than 10% before
replacement.
Probability of X when x < 0.10 This means 10% or 0.10 bulbs should expire before replacement.

However,
We search for the z-score in the z distribution table when x < 0.10. The area 10% left to it has a
z-score equivalent to -1.28
Thus, z – score = -1.28

Now, by using the z Formula;

z = x–μ
σ

-1.28 = x – 120
20
-1.28 * 20 = x – 120
-25.6 = x – 120
-25.6 + 120 = x
94.4 = x

So, Interval between replacements of 1000 light bulbs to make sure that not more than
10% light bulbs should expire before replacement is 94.4 days.

CONCLUSION:
In conclusion, it is recognized that after every 94.4 days bulbs should be replaced if the firm
does not want to expire 10% before replacement. Thus, for the 1000 light bulbs with the mean
value of 120 days and standard deviation of 20 days, the interval should be allowed for 94.4 days
if not more than 10% should expire before replacement.

ANSWER 3 (B)

INTRODUCTION:
Measures of Central Tendency are a statistical concept which enables an individual to make use
of single number to indicate a whole distribution and set of information. Every central tendency
measure precisely represents the each and every aspect of distribution data. There are three types
of measures and these are Mean, Median and Mode. Mean is the average of observations
recorded. Median is the middle value of the recorded data. Mode is the value which occur the
most in the data set. From all these measures, Mean is the most common and significant measure
of central tendency.

CONCEPT:
Mean is also referred to as Average. It is defined as the average of all the observations that are
recorded. It is computed by dividing the sum of all observations by the total number of
observations. Mean is calculated for grouped as well as ungrouped data. It is denoted by x̄.

For Ungrouped Data, it is calculated as:

MEAN = Sum of All Observations


Total No of Observations

For Grouped Data, it is calculated as:

MEAN = ∑fx
∑f
Where; f = frequency of the data
x = mid value of the interval
∑ = summation

According to the question,


We have to find the Mean of Male and Female Migrants.

AGE MALE FEMALE MID


GROUP MIGRANTS MIGRANTS VALUE MX FX
(M) (F) (X)
0–4 98,34,738 91,27,975 2 1,96,69,476 1,82,55,950
5–9 1,09,59,506 99,58,059 7 7,67,16,542 6,97,06,413
10 – 14 1,24,25,108 1,14,51,227 12 14,91,01,296 13,74,14,724
15 – 19 1,26,83,733 1,65,18,666 17 21,56,23,461 28,08,17,322
20 – 24 1,31,97,283 3,36,58,466 22 29,03,40,226 74,04,86,252
25 – 29 1,30,45,214 3,75,22,017 27 35,22,20,778 1,01,30,94,459
30 – 34 1,21,34,009 3,42,86,096 32 38,82,88,288 1,09,71,55,072
35 – 39 1,20,60,030 3,30,54,887 37 44,62,21,110 1,22,30,30,819
40 – 44 1,09,00,143 2,72,61,236 42 45,78,06,006 1,14,49,71,912
45 – 49 97,04,026 2,34,47,716 47 45,60,89,222 1,10,20,42,652
50 – 54 79,40,152 1,78,42,986 52 41,28,87,904 92,78,35,272
55 – 59 61,61,754 1,51,92,910 57 35,12,19,978 86,59,95,870
60 – 64 54,01,736 1,43,47,372 62 33,49,07,632 88,95,37,064
65 – 69 36,87,082 1,01,41,196 67 24,70,34,494 67,94,60,132
70 – 74 26,62,421 70,33,728 72 19,16,94,312 50,64,28,416
75 – 79 13,41,572 34,93,001 77 10,33,01,044 26,89,61,077
80 - 85 14,61,296 42,53,695 82.5 12,05,56,920 35,09,29,838
TOTAL 14,55,99,803 30,85,91,233 4,61,36,78,689 11,31,61,23,244

However, the given data is in continuous form, so the formula of Mean for continuous series is
as follows:
MEAN (x̄) = ∑fx
∑f

Where; f = frequency of the data


x = mid value of the interval
∑ = summation

So, with the given dataset, firstly we need to calculate the Mid Value of All Age Groups.
Calculation of Mid Value is as follows:

Mid Value = (Last Term + First Term) / 2

So, Age Group: 0 – 4 = (4 + 0) / 2


=4/2 =2
Age Group: 5 – 9 = (9 + 5) / 2
= 14 / 2 =7
Similarly, we can calculate for all age groups.

Now, we have to calculate the product of frequency of migrants each of male and female with
the mid value.

As shown in the given table above,


MX = Product of frequency of Male Migrants and Mid Value
FX = Product of frequency of Female Migrants and Mid Value

The summation of column Male Migrants, Female Migrants, MX and FX has to be done
separately. Therefore, the values are:

Frequency of Male Migrants (M) = 14,55,99,803


Frequency of Female Migrants (F) = 30,85,91,233
Summation of MX = 4,61,36,78,689
Summation of FX = 11,31,61,23,244

Thus, Average Age of Male Migrants = ∑MX


∑M
= 4,61,36,78,689
14,55,99,803
= 31.687 Years

The Average age of Male Migrants is 31.69 years.

Average Age of Female Migrants = ∑FX


∑F
= 11,31,61,23,244
30,85,91,233
= 36.67 Years
The Average age of Female Migrants is 36.67 years.

CONCLUSION:
In conclusion, the Average Age or Mean of Male Migrants and Female Migrants are 31.69 years
and 36.67 years respectively. It indicates that females are older than males. It could happen due
to many factors such as migration, life expectancy of people etc.

You might also like