Assignment - 1 - El - Ajouz Correction
Assignment - 1 - El - Ajouz Correction
Assignment - 1 - El - Ajouz Correction
MSBA 310
Assignment I
Submission Date:
22 – Oct – 2022
Import the csv file into R and answer the following questions:
a. Compute the mean, quartiles, min, max, and standard deviation of net sales.
Interpret
summary(Net.Sales)
1
Min. 1st Qu. Median Mean 3rd Qu. Max.
sd(Net.Sales)
[1] 55.66494
Based on the above-mentioned codes, we say that Pelican Stores’ net sales/ customer range from
13.23$ and 287.59$. 25% of its customers purchase for up to 39.6$, and only 25% purchase
products for more than 100.9$. On average, customers purchase products for 77.6$ (mean).
To study the dispersion of the given data, the standard deviation is compared with the mean:
sd(Net.Sales)/mean(Net.Sales)
[1] 0.7173271
The proportion is 71.7%. This shows that the data is of high dispersion; hence, we say that the
median (59.7$) is more representative for the data from the mean (77.6$).
b. Use an appropriate chart (justify its use) and compare net sales among regular and
promotional customers. What this chart can tell you? Explain
The used chart is the side-by-side boxplot. This box plot is used to compare the net sales
(quantitative variable) and customer type (qualitative variable). The boxplot aggregates the net
sales data according to the customer types, and then the boxplot shows the median sales of each
category, minimum value, maximum value and other quartiles.
net_sales_by_customer_type=boxplot(Net.Sales~Type.of.Customer,col=c("blue","red"),main="
Net Sales by Customer Type",ylab = "Net Sales",medcol="white")
2
By comparing the two boxplots, we realize that:
1-The median of the promotional customers is higher than that of the regular customers. This
shows that the amount of money spent by the upper 50% of the promotional customers is higher
than that of the regular customers.
2- The range of the net sales is wider for the promotional customers. This suggests that the
amount of money spent by the promotional customers are more dispersed than that spent by the
regular customers.
3- The promotional customers’ data has more outliers. This also shows that it has more variation
and dispersion.
12/12
c. Produce Quantile-Quantile plot for the customer age. Comment on the shape of the
distribution.
qqline(Age,col="red")
3
This plot shows that the data tends to be along the qqline of the normal distribution (theoretical
quantiles). Hence, we could approximate the data to the normal distribution, though there are
some deviations, especially at the two extremes of the data. This could be because of some
outlier values.
d. Compute a 95% confidence interval for the mean net sales generated by regular
customers. Interpret
regular_sales=subset(Net.Sales,Type.of.Customer=="Regular")
length(regular_sales)
Output 1:
[1] 30
Since the sample size is not less than 30, we could approximate the average to the normal
distribution but with no given variance. Hence, we use the t test.
t.test(regular_sales)
Output 2:
4
Interpretation: We are 95% confident that the mean of net amount of sales spent by the regular
customers falls between 48.897$ and 75.086$.
7/7
e. Compute a 95% confidence interval for the mean net sales generated by
promotional customers. Interpret
promotional_sales=subset(Net.Sales,Type.of.Customer=="Promotional")
length(promotional_sales)
Output 1:
[1] 70
Since the sample size is above 30, we could approximate the estimate as to the normal
distribution but with no given variance. We use the t test.
t.test(promotional_sales)
Output 2:
Interpretation: We are 95% confident that the mean of net amount of sales spent by the
promotional customers falls between 69.635$ and 98.945$.
7/7
5
f. Compare the findings of parts (e) and (f). What conclusion can you provide?
Elaborate
As a first step, by comparing the 95% confidence intervals, we notice that the mean of net sales
of promotion customers is higher than that of the regular customers. Even the confidence
intervals do not intersect.
Hypothesis Test:
var.test(Net.Sales~Type.of.Customer,data=shoppers)
Output:
P-value<0.05. Hence, we reject H0 (the true variances are equal). So, we consider the variances
as not equal.
t.test(Net.Sales~relevel(factor(Type.of.Customer), ref="Promotional"),
data=shoppers,alternative="greater")
Output:
6
The p-value = 0.012 <0.05. Therefore, we reject H0 We have enough evidence to say that the
mean sales spent by promotional customers is higher than that of regular customers, with a 95%
confidence level.
6/8 he must say that they are not higher since the Confidence intervals overlap
g. What proportion of promotional customers that Pelican Stores should expect in general?
Use a 95% confidence level and interpret
prop.test(length(promotional_sales),length(Net.Sales))
Output:
This means that we are 95% confident that the percentage of promotional customers of Pelician
Stores lays between 59.9% and 78.5%.
8/8
7
n x p = 100 x 0.7 = 70 >5
The resulted p-value = 0.994 > 0.05. Hence, we say we do not have enough evidence to reject
H0, and we consider the proportion to be <= 0.8.
i. Use an appropriate chart and visualize the relationship between age and net sales. Do you
think there is an association between them? Justify
Since both, age and net sales, are quantitative variables, we use the scatterplot to visualize the
relationship between them.
Output:
8
It does not seem that there is association (correlation) between these two variables since the line
used to plot the correlation is a horizontal line: showing that the net sales is not affected by
difference in ages.
12/12
j. We want test whether the method of payment is related to the type of customer. What is
the most relevant test to use? Justify?
Since both variables are qualitative, we use the chi-squared independence test.
7/7
k. Can you conclude that he method of payment and customer type are related? Conduct the
test at a 5% significant level and specify the testing steps
tbl = table(Type.of.Customer,Method.of.Payment)
chisq.test(tbl)
Output:
p-value = 0.000355 < 0.05. Hence, with a 95% confidence level, we reject the null
hypothesis (that states that the net sales is independent from the type of customer). Hence, we
have enough evidence to say that the net sales is dependent on the type of the customer.