0% found this document useful (0 votes)
64 views

Decision Science Assignment

Nemi Mehta cultivates the medicinal plant Giloy and produces Giloy Vati. He wants to analyze the effect of advertising on sales of the product. A regression analysis is performed to establish the relationship between advertising expenditures and sales. There is a high positive correlation between the two variables. The regression model finds that for every 1 rupee increase in advertising, sales increase by 24.824 rupees on average, with average sales of 140,300 rupees without any advertising. The model can be used to predict future sales based on planned advertising expenditures.

Uploaded by

Pooja Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Decision Science Assignment

Nemi Mehta cultivates the medicinal plant Giloy and produces Giloy Vati. He wants to analyze the effect of advertising on sales of the product. A regression analysis is performed to establish the relationship between advertising expenditures and sales. There is a high positive correlation between the two variables. The regression model finds that for every 1 rupee increase in advertising, sales increase by 24.824 rupees on average, with average sales of 140,300 rupees without any advertising. The model can be used to predict future sales based on planned advertising expenditures.

Uploaded by

Pooja Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Decision Science

Answer 1:
Nemi Mehta owns 50 Hectares of land on which he cultivates Giloy, which is a medicinal
plant and is preparing Giloy Vati with his team as per the Ayurvedic scripture.
He is spending in advertising to affect the Sales of the produced Giloy Vati and wants to
analyze the effect of advertisements on Sales.

He also wants to do a deep dive analysis into the relationship by finding out the correlation
between the variables under consideration. Then, in the presence of correlation he wants to
run a regression to help him in prediction and establish a cause and effect relationship.
• Graphical Representation

The analysis begins by plotting the sales and advertisement expenditure on a graph to
see whether there is a visual linear relationship between the two variables.
For this a scatter plot is the right choice where Sales is on the x-axis and
advertisements expenditure is on the y-axis.

Relationship between Sales and Advertisement Expenditure


20.0
Advertisement Expenditure (INR '000)

18.0
16.0
14.0
12.0
10.0
8.0
6.0
4.0
2.0
-
- 100.0 200.0 300.0 400.0 500.0 600.0 700.0
Sales (INR '000)

Fig. 1

The above figure shows a positive relationship between the variables as an increase in
one variable corresponds to an increase in other variables and can be seen from the
upward rising trend line.
• Karl Pearson’s Correlation Coefficient
This relationship can be measured by calculating the correlation coefficient between
the variables. One of such methods is given by Karl Pearson which calculates the
coefficient of correlation which has a value between -1 and 1 depending on the
magnitude and direction of relationship.

Karl Person’s Correlation Coefficient can be calculated as below:


𝑋 𝑌
∑(𝑋− )( 𝑌− )
𝑁 𝑁
r= 𝑋 𝑌
√(𝑋− )2 √( 𝑌− )2
𝑁 𝑁

Where: X is an arbitrary variable and X/N is the mean of X


Y is an arbitrary variable and Y/N is the mean of Y

We will be applying the formula to our data

Advertising
Expenditure
Sales ( TV Spots
(INR per month)
'000) (INR
(X) '000)(Y) (X-X/N) (Y-Y/N) (X-X/N)^2 (Y-Y/N)^2 (X-X/N)*(Y-Y/N)

260.3 5.0 -138.2 -5.4 19,097.9 29.2 746.3

286.1 7.0 -112.4 -3.4 12,632.6 11.6 382.1

279.4 6.0 -119.1 -4.4 14,183.6 19.4 524.0

410.8 9.0 12.3 -1.4 151.4 2.0 -17.2

438.2 12.0 39.7 1.6 1,576.5 2.6 63.5

315.3 8.0 -83.2 -2.4 6,921.4 5.8 199.7

565.1 11.0 166.6 0.6 27,757.2 0.4 100.0

570.0 16.0 171.5 5.6 29,414.0 31.4 960.4

426.1 13.0 27.6 2.6 762.0 6.8 71.8

315.0 7.0 -83.5 -3.4 6,971.4 11.6 283.9

403.6 10.0 5.1 -0.4 26.1 0.2 -2.0

220.5 4.0 -178.0 -6.4 31,682.2 41.0 1,139.2

343.6 9.0 -54.9 -1.4 3,013.5 2.0 76.9

644.6 17.0 246.1 6.6 60,567.7 43.6 1,624.3

520.4 19.0 121.9 8.6 14,860.8 74.0 1,048.4


329.5 9.0 -69.0 -1.4 4,760.3 2.0 96.6

426.0 11.0 27.5 0.6 756.5 0.4 16.5

343.2 8.0 -55.3 -2.4 3,057.5 5.8 132.7

450.4 13.0 51.9 2.6 2,694.1 6.8 135.0

421.8 14.0 23.3 3.6 543.1 13.0 83.9

Table 1
𝑋 𝑌
∑(𝑋− )( 𝑌− )
𝑁 𝑁
r= 𝑋 𝑌
√(𝑋− )2 √( 𝑌− )2
𝑁 𝑁

7665.74
r=
√241429.9√308.8

7665.74
r = 8634.44

r = 0.8878

We see that the correlation coefficient is 0.89 which shows a high positive correlation
and an increase in advertisement expenses will lead to a rise in sales. This gives us
confirmation of the existence of relationship between the variables.

• Regression Analysis
Now we will do a regression analysis to build a prediction model.
This model is characterized by Sales which is our dependent variable and advertising
expense which is our independent variable.

Our model will include an intercept and a slope. Inclusion of intercept is necessary as
our starting point is not at the origin and there will still be some sales without any
advertisement.
Our regression equation will look like:

Sales(Y) = α +βAdvertisement(X)
Where α is the intercept and β is the slope parameter
α = ∑Y/N – β*∑X/N
𝑋 𝑌
∑(𝑋− )( 𝑌− )
β= 𝑁
𝑋
𝑁
∑(𝑋− )2
𝑁

Therefore
7665.74
β= 308.8

β = 24.824

Substitute to get α
α = 7969.9 – 208*24.824
α = 140.3

So we get our regression equation

Sales(Y) = 140.3 +24.824*Advertisement(X)

This means that on an average Sales is INR 1,40,300 without any advertisement
expenditure and one rupee increase in advertisement expenditure leads to an increase
of 24.824 rupees.
Now we will predict Sales using our regression equation below:

Predicted
Region Sales (INR Advertising Expenditure ( TV Spots per Sales (INR
Code '000) (X) month) (INR '000)(Y) '000) (X)
264.4
1.0 260.3 5.0
314.1
2.0 286.1 7.0
289.3
3.0 279.4 6.0
363.7
4.0 410.8 9.0
438.2
5.0 438.2 12.0
338.9
6.0 315.3 8.0
413.4
7.0 565.1 11.0
537.5
8.0 570.0 16.0
463.0
9.0 426.1 13.0
314.1
10.0 315.0 7.0
388.6
11.0 403.6 10.0
239.6
12.0 220.5 4.0
363.7
13.0 343.6 9.0
562.3
14.0 644.6 17.0
612.0
15.0 520.4 19.0
363.7
16.0 329.5 9.0
413.4
17.0 426.0 11.0
338.9
18.0 343.2 8.0
463.0
19.0 450.4 13.0
487.9
20.0 421.8 14.0
Table 2
• Graph for Actual vs Predicted Sales

Now since we have computed our regression model, we can plot the predicted and
actual Sales to see how good the model performs.

Actual Sales v/s Predicted Sales


700.0
600.0
500.0
400.0
300.0
200.0
100.0
-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Sales (INR '000) (X) Predicted Sales (INR '000) (X)

Fig 2

We see a good prediction of our model and the predictions are very close to actual
sales.

Answer 2:
The data in this problem contains Single year age population from census 2011.
Below we have arranged the data to get better clarity of the table:
Age in Population Age Population Age Population Age Population Age Population
Years
1 1,958 in11 1,998 in21 2,594 in31 2,844 in41 1,802
2 1,725 Years
12 1,916 Years
22 2,839 Years
32 2,684 Years
42 1,751
3 1,814 13 2,138 23 2,935 33 2,696 43 1,659
4 1,768 14 2,139 24 3,601 34 2,781 44 1,652
5 1,871 15 2,096 25 4,110 35 2,799 45 1,806
6 1,888 16 2,044 26 4,089 36 2,450 46 1,460
7 1,768 17 2,027 27 3,716 37 2,142 47 1,226
8 1,712 18 2,065 28 3,702 38 2,114 48 1,225
9 1,780 19 2,013 29 3,084 39 1,725 49 1,006
10 1,862 20 2,459 30 3,475 40 2,218 50 1,454
Total 18,146 Total 20,895 Total 34,145 Total 24,453 Total 15,041
Table 1

Now we will divide the data into 5 groups with 10 years age in each group and
summarize the results by calculating mean, standard deviation and variance.

Age Standard
Groups Population(X) Mean Deviation Variance
1-10 18,146 1,815 74 6,164
11-20 20,895 2,090 139 21,364
21-30 34,145 3,415 499 2,77,138
31-40 24,453 2,445 359 1,43,206
41-50 15,041 1,504 264 77,168
Table 2
Where mean, standard deviation and variance have been calculated as :
1
Mean = 𝑛 ∑1𝑖=𝑛 𝑥𝑖 = 2,254

𝑥
√∑1𝑖=𝑛(𝑥𝑖 − 𝑖 )^2
𝑁
Standard Deviation = = 726
√𝑛

Variance = (Standard Deviation)^2 = 5,27,675

From the above metrics we see that, on an average the population is 2,254 for each
age.

Standard deviation of 726 shows that on an average the populations deviate by 726
from the mean.

Variance shows the spread of variables from its mean, it gives a high value if there are
large variations and inflate in the presence of outliers.

Now we move to making the Ogive curves which will show us about the cumulative
distribution:
Less than Ogive

Age Cumulative frequency ( Less


Groups Population than )
1-10 18,146 18,146
11-20 20,895 39,041
21-30 34,145 73,186
31-40 24,453 97,639
41-50 15,041 1,12,680
Table 3

Corresponding graph is given below:

Less than Ogive


1,20,000
1,00,000
80,000
60,000
40,000
20,000
0
1-10 11-20 21-30 31-40 41-50

Fig 1
More than Ogive:
Age Cumulative frequency (More
Groups Population than)
1-10 18,146 1,12,680
11-20 20,895 94,534
21-30 34,145 73,639
31-40 24,453 39,494
41-50 15,041 15,041
Table 4

Below is the corresponding graph:


More than Ogive
1,20,000
1,00,000
80,000
60,000
40,000
20,000
0
1-10 11-20 21-30 31-40 41-50

Fig 2

From the above Ogive curves we observe the cumulative trend and the concentration
of the population and we see that all the groups have populations which does not vary greatly
from each other’s group.

Histogram:

Now we will prepare an histogram to look at the distribution of the variable under
consideration.
Age
Groups Population
1-10 18,146
11-20 20,895
21-30 34,145
31-40 24,453
41-50 15,041
Table 5
Now we will plot this table into a histogram:

Population
40,000

35,000

30,000

25,000

20,000

15,000

10,000

5,000

0
1 to 10 11 to 20 21 to 30 31 to 40 41 to 50
The histogram shows that most of the population in the age group of 21 to 30, implying a
young population group has most population as per the census 2011 data.

Answer 3 a.
Exponential Smoothening is used for prediction of a time series variable.
It smoothens outs the data by taking a weighted average of present and past values
using parameter alpha.

Here we will use the above method to forecast the annual rainfall for the state of
Gujrat for 2017.
We have 3 alpha values on which this model will run and based on each models Mean
Squared Errors (MSE) and Mean Absolute Deviation (MAD).

st = αxt-1 + (1-α) st-1

Where,

st is the smoothened statistic for period t


α is the smoothening factor, 0< α<1
xt is the current value of the variable

Below is the forecast table for α = 0.2, 0.5 and 0.8:

SUBDIVISION YEAR ANNUAL (in Forecast Forecast Forecast


Gujarat Region 1997 MM) 1,069 (0.2)1,069 (0.5)1,069 (0.8)
1,069
Gujarat Region 1998 1,070 1,069 1,069 1,069
Gujarat Region 1999 568 1,069 1,069 1,070
Gujarat Region 2000 551 969 819 669
Gujarat Region 2001 849 885 685 574
Gujarat Region 2002 637 878 767 794
Gujarat Region 2003 1,160 830 702 669
Gujarat Region 2004 1,006 896 931 1,062
Gujarat Region 2005 1,316 918 968 1,017
Gujarat Region 2006 1,478 998 1,142 1,257
Gujarat Region 2007 1,179 1,094 1,310 1,434
Gujarat Region 2008 911 1,111 1,245 1,230
Gujarat Region 2009 642 1,071 1,078 975
Gujarat Region 2010 1,089 985 860 708
Gujarat Region 2011 891 1,006 974 1,013
Gujarat Region 2012 714 983 932 915
Gujarat Region 2013 1,119 929 823 754
Gujarat Region 2014 706 967 971 1,046
Gujarat Region 2015 623 915 838 774
Gujarat Region 2016 765 856 731 653
Gujarat Region 748 743
2017 838
Table 1

1
MAD = 𝑛 ∑𝑛𝑖=1 |𝑥𝑖 − 𝑚𝑖 |
1
MSE = ∑𝑛𝑖=1(𝑥𝑖 − 𝑚𝑖 )2
𝑛

Where mi is the forecast for that month of the series

|Xi- |Xi- |Xi-


ANNUAL Forecast Forecast Forecast (xi-m)^2 (xi-m)^2 (xi-m)^2
YEAR (in MM) m| m| m|
(0.2) (0.5) (0.8) (0.2) (0.5) (0.8)
(0.2) (0.5) (0.8)
1997 1,069 1,069 1,069 1,069 0 0 0 0 0 0
1998 1,070 1,069 1,069 1,069 1 1 1 1 1 1
1999 568 1,069 1,069 1,070 501 501 501 2,50,721 2,51,051 2,51,382
2000 551 969 819 669 418 268 118 1,75,038 71,998 13,942
2001 849 885 685 574 36 164 275 1,318 26,974 75,507
2002 637 878 767 794 241 130 157 58,004 16,817 24,600
2003 1,160 830 702 669 330 458 492 1,09,182 2,10,002 2,41,800
2004 1,006 896 931 1,062 110 75 56 12,065 5,570 3,153
2005 1,316 918 968 1,017 398 348 299 1,58,781 1,21,045 89,622
2006 1,478 998 1,142 1,257 480 336 221 2,30,764 1,12,599 49,051
2007 1,179 1,094 1,310 1,434 85 131 255 7,260 17,245 64,926
2008 911 1,111 1,245 1,230 200 333 319 39,855 1,11,196 1,01,609
2009 642 1,071 1,078 975 429 436 333 1,84,221 1,90,297 1,11,057
2010 1,089 985 860 708 104 229 380 10,760 52,434 1,44,742
2011 891 1,006 974 1,013 115 84 122 13,274 7,007 14,911
2012 714 983 932 915 269 218 201 72,184 47,678 40,370
2013 1,119 929 823 754 190 295 364 35,972 87,275 1,32,799
2014 706 967 971 1,046 261 265 340 68,210 70,325 1,15,611
2015 623 915 838 774 292 215 151 85,110 46,395 22,742
2016 765 856 731 653 91 34 112 8,352 1,177 12,508
2017 1,024 838 748 743 186 277 282 34,741 76,536 79,449

SUM 4,738 4,800 4,980 15,55,814 15,23,621 15,89,780


Table 5

MSE:
Alpha (0.2) = 1555814/21 = 74,086
Alpha (0.5) = 1523621/21 = 72,553
Alpha (0.8) = 1589780/21 = 75,704

MAD:

Alpha (0.2) = 4738/21 = 226


Alpha (0.5) = 4800/21 = 229
Alpha (0.8) = 4980/21 = 237

From the results we see that the model with alpha value of 0.5 has MSE and the
models with alpha value of 0.2 lower MAD. Therefore, there is no clarity on which
model is better.

Answer 3 b.
The distance traveled by the vehicle is normally distributed with mean value of 65
miles and a standard deviation of 4 miles.
We know that approximately:
• 68.27% of observations lie between Mean +-1*Standard Deviation
• 95.45% of observations lie between Mean +-1*Standard Deviation
• 99.73% of observations lie between Mean +-1*Standard Deviation
The charts include the shaded region with respective probability.

1. The car travels more than 70 miles per gallon


Z value will give us the probability of this event
Z = (x – mean(x))/standard_deviation(x)
= (70-65)/4 = 5/4 = 1.25
Looking in the Z table we get 0.894
Therefor the probability that car travels more than 70 miles per gallon is 0.106 or
10.6%

65 70

Fig 1
2. The car travels less than 60 miles per gallon
Z value will give us the probability of this event
Z = (x – mean(x))/standard_deviation(x)
= (60-65)/4 = -5/4 = -1.25
Looking in the Z table we get 0.106
Therefor the probability that car travels less than 60 miles per gallon is 0.106 or
10.6%%

60 65

Fig 2

3. The car travels less than 70 miles and greater than 55 per gallon
Z value will give us the probability of this event
Z for 55 = (x – mean(x))/standard_deviation(x)
= (55-65)/4 = -10/4 = -2.5
Z for 70 = (x – mean(x))/standard_deviation(x)
= (70-65)/4 = 5/4 = 1.25

Looking in the Z table we get 0.0.006 and 0.894


Therefor the probability that car travels between 55 and 70 miles per gallon is
(0.894-0.006) = 0.888 0r 88.8%
55 65 70
Fig 3

You might also like