Decision Science Assignment
Decision Science Assignment
Answer 1:
Nemi Mehta owns 50 Hectares of land on which he cultivates Giloy, which is a medicinal
plant and is preparing Giloy Vati with his team as per the Ayurvedic scripture.
He is spending in advertising to affect the Sales of the produced Giloy Vati and wants to
analyze the effect of advertisements on Sales.
He also wants to do a deep dive analysis into the relationship by finding out the correlation
between the variables under consideration. Then, in the presence of correlation he wants to
run a regression to help him in prediction and establish a cause and effect relationship.
• Graphical Representation
The analysis begins by plotting the sales and advertisement expenditure on a graph to
see whether there is a visual linear relationship between the two variables.
For this a scatter plot is the right choice where Sales is on the x-axis and
advertisements expenditure is on the y-axis.
18.0
16.0
14.0
12.0
10.0
8.0
6.0
4.0
2.0
-
- 100.0 200.0 300.0 400.0 500.0 600.0 700.0
Sales (INR '000)
Fig. 1
The above figure shows a positive relationship between the variables as an increase in
one variable corresponds to an increase in other variables and can be seen from the
upward rising trend line.
• Karl Pearson’s Correlation Coefficient
This relationship can be measured by calculating the correlation coefficient between
the variables. One of such methods is given by Karl Pearson which calculates the
coefficient of correlation which has a value between -1 and 1 depending on the
magnitude and direction of relationship.
Advertising
Expenditure
Sales ( TV Spots
(INR per month)
'000) (INR
(X) '000)(Y) (X-X/N) (Y-Y/N) (X-X/N)^2 (Y-Y/N)^2 (X-X/N)*(Y-Y/N)
Table 1
𝑋 𝑌
∑(𝑋− )( 𝑌− )
𝑁 𝑁
r= 𝑋 𝑌
√(𝑋− )2 √( 𝑌− )2
𝑁 𝑁
7665.74
r=
√241429.9√308.8
7665.74
r = 8634.44
r = 0.8878
We see that the correlation coefficient is 0.89 which shows a high positive correlation
and an increase in advertisement expenses will lead to a rise in sales. This gives us
confirmation of the existence of relationship between the variables.
• Regression Analysis
Now we will do a regression analysis to build a prediction model.
This model is characterized by Sales which is our dependent variable and advertising
expense which is our independent variable.
Our model will include an intercept and a slope. Inclusion of intercept is necessary as
our starting point is not at the origin and there will still be some sales without any
advertisement.
Our regression equation will look like:
Sales(Y) = α +βAdvertisement(X)
Where α is the intercept and β is the slope parameter
α = ∑Y/N – β*∑X/N
𝑋 𝑌
∑(𝑋− )( 𝑌− )
β= 𝑁
𝑋
𝑁
∑(𝑋− )2
𝑁
Therefore
7665.74
β= 308.8
β = 24.824
Substitute to get α
α = 7969.9 – 208*24.824
α = 140.3
This means that on an average Sales is INR 1,40,300 without any advertisement
expenditure and one rupee increase in advertisement expenditure leads to an increase
of 24.824 rupees.
Now we will predict Sales using our regression equation below:
Predicted
Region Sales (INR Advertising Expenditure ( TV Spots per Sales (INR
Code '000) (X) month) (INR '000)(Y) '000) (X)
264.4
1.0 260.3 5.0
314.1
2.0 286.1 7.0
289.3
3.0 279.4 6.0
363.7
4.0 410.8 9.0
438.2
5.0 438.2 12.0
338.9
6.0 315.3 8.0
413.4
7.0 565.1 11.0
537.5
8.0 570.0 16.0
463.0
9.0 426.1 13.0
314.1
10.0 315.0 7.0
388.6
11.0 403.6 10.0
239.6
12.0 220.5 4.0
363.7
13.0 343.6 9.0
562.3
14.0 644.6 17.0
612.0
15.0 520.4 19.0
363.7
16.0 329.5 9.0
413.4
17.0 426.0 11.0
338.9
18.0 343.2 8.0
463.0
19.0 450.4 13.0
487.9
20.0 421.8 14.0
Table 2
• Graph for Actual vs Predicted Sales
Now since we have computed our regression model, we can plot the predicted and
actual Sales to see how good the model performs.
Fig 2
We see a good prediction of our model and the predictions are very close to actual
sales.
Answer 2:
The data in this problem contains Single year age population from census 2011.
Below we have arranged the data to get better clarity of the table:
Age in Population Age Population Age Population Age Population Age Population
Years
1 1,958 in11 1,998 in21 2,594 in31 2,844 in41 1,802
2 1,725 Years
12 1,916 Years
22 2,839 Years
32 2,684 Years
42 1,751
3 1,814 13 2,138 23 2,935 33 2,696 43 1,659
4 1,768 14 2,139 24 3,601 34 2,781 44 1,652
5 1,871 15 2,096 25 4,110 35 2,799 45 1,806
6 1,888 16 2,044 26 4,089 36 2,450 46 1,460
7 1,768 17 2,027 27 3,716 37 2,142 47 1,226
8 1,712 18 2,065 28 3,702 38 2,114 48 1,225
9 1,780 19 2,013 29 3,084 39 1,725 49 1,006
10 1,862 20 2,459 30 3,475 40 2,218 50 1,454
Total 18,146 Total 20,895 Total 34,145 Total 24,453 Total 15,041
Table 1
Now we will divide the data into 5 groups with 10 years age in each group and
summarize the results by calculating mean, standard deviation and variance.
Age Standard
Groups Population(X) Mean Deviation Variance
1-10 18,146 1,815 74 6,164
11-20 20,895 2,090 139 21,364
21-30 34,145 3,415 499 2,77,138
31-40 24,453 2,445 359 1,43,206
41-50 15,041 1,504 264 77,168
Table 2
Where mean, standard deviation and variance have been calculated as :
1
Mean = 𝑛 ∑1𝑖=𝑛 𝑥𝑖 = 2,254
𝑥
√∑1𝑖=𝑛(𝑥𝑖 − 𝑖 )^2
𝑁
Standard Deviation = = 726
√𝑛
From the above metrics we see that, on an average the population is 2,254 for each
age.
Standard deviation of 726 shows that on an average the populations deviate by 726
from the mean.
Variance shows the spread of variables from its mean, it gives a high value if there are
large variations and inflate in the presence of outliers.
Now we move to making the Ogive curves which will show us about the cumulative
distribution:
Less than Ogive
Fig 1
More than Ogive:
Age Cumulative frequency (More
Groups Population than)
1-10 18,146 1,12,680
11-20 20,895 94,534
21-30 34,145 73,639
31-40 24,453 39,494
41-50 15,041 15,041
Table 4
Fig 2
From the above Ogive curves we observe the cumulative trend and the concentration
of the population and we see that all the groups have populations which does not vary greatly
from each other’s group.
Histogram:
Now we will prepare an histogram to look at the distribution of the variable under
consideration.
Age
Groups Population
1-10 18,146
11-20 20,895
21-30 34,145
31-40 24,453
41-50 15,041
Table 5
Now we will plot this table into a histogram:
Population
40,000
35,000
30,000
25,000
20,000
15,000
10,000
5,000
0
1 to 10 11 to 20 21 to 30 31 to 40 41 to 50
The histogram shows that most of the population in the age group of 21 to 30, implying a
young population group has most population as per the census 2011 data.
Answer 3 a.
Exponential Smoothening is used for prediction of a time series variable.
It smoothens outs the data by taking a weighted average of present and past values
using parameter alpha.
Here we will use the above method to forecast the annual rainfall for the state of
Gujrat for 2017.
We have 3 alpha values on which this model will run and based on each models Mean
Squared Errors (MSE) and Mean Absolute Deviation (MAD).
Where,
1
MAD = 𝑛 ∑𝑛𝑖=1 |𝑥𝑖 − 𝑚𝑖 |
1
MSE = ∑𝑛𝑖=1(𝑥𝑖 − 𝑚𝑖 )2
𝑛
MSE:
Alpha (0.2) = 1555814/21 = 74,086
Alpha (0.5) = 1523621/21 = 72,553
Alpha (0.8) = 1589780/21 = 75,704
MAD:
From the results we see that the model with alpha value of 0.5 has MSE and the
models with alpha value of 0.2 lower MAD. Therefore, there is no clarity on which
model is better.
Answer 3 b.
The distance traveled by the vehicle is normally distributed with mean value of 65
miles and a standard deviation of 4 miles.
We know that approximately:
• 68.27% of observations lie between Mean +-1*Standard Deviation
• 95.45% of observations lie between Mean +-1*Standard Deviation
• 99.73% of observations lie between Mean +-1*Standard Deviation
The charts include the shaded region with respective probability.
65 70
Fig 1
2. The car travels less than 60 miles per gallon
Z value will give us the probability of this event
Z = (x – mean(x))/standard_deviation(x)
= (60-65)/4 = -5/4 = -1.25
Looking in the Z table we get 0.106
Therefor the probability that car travels less than 60 miles per gallon is 0.106 or
10.6%%
60 65
Fig 2
3. The car travels less than 70 miles and greater than 55 per gallon
Z value will give us the probability of this event
Z for 55 = (x – mean(x))/standard_deviation(x)
= (55-65)/4 = -10/4 = -2.5
Z for 70 = (x – mean(x))/standard_deviation(x)
= (70-65)/4 = 5/4 = 1.25