0% found this document useful (0 votes)
37 views20 pages

Pankaj Soni Gamma 199 BA Assignment

The document describes a project to analyze sales data from Walmart stores using R. It provides details on the dataset and lists 16 problem statements to analyze the data, including summarizing the data, calculating total sales by store, and creating visualizations of sales trends.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views20 pages

Pankaj Soni Gamma 199 BA Assignment

The document describes a project to analyze sales data from Walmart stores using R. It provides details on the dataset and lists 16 problem statements to analyze the data, including summarizing the data, calculating total sales by store, and creating visualizations of sales trends.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Business Analytics (3PGDM02)

PGDM-trimester-3
Real World Data Analysis using R Programming Language
on Walmart Dataset (20 Marks)

Objective of the Assignment: This project work will help participants to apply the knowledge
gained throughout the course and how it is actually helping in a real world use case
Units Covered: All Units are covered in this Project

Instructions:
• This project would be evaluated for the CEC Component of 20 Marks in the final marks
for the Business Analytics Subject
• Each student has to develop the project independently
• Submission has to be done in ERP in the form of a PDF document containing the
Problem Statement, Code Script and Output Screenshot of the terminal
• The Output screenshot has to from the R terminal having your roll number as the prompt
set (e.g., PGDM2022-XXX> )
• When you come for the submission of the project, kindly prepare for the Viva which
would be based on the Project which you have done and how you have applied the
concepts learnt throughout the course (Viva will also play an integral role in getting the
assessment marks)

Report Format:
The project report should be distributed under chapters as below-
1. Title Page (as per the given format)
2. Introduction of the project / Dataset / Source File Information
3. Problem Statement followed by R Script followed by the Output of the terminal

Note:
• It must be in PDF format with proper formatting. [Even if printed report is required, it
must be done after converting to pdf format]
• Font type to be Times News Roman and Font Size should be 12 for body and 14 for
headings. Paragraph spacing should be 1.5.
• For the R Scripts, try to give font type as Consolas or Courier New and Size should be
12
• The first page should be the title page as attached.
• Make sure your assignment is uniformly formatted with all information as instructed.
Project on
Real World Data Analysis using
R Programming Language
On Walmart Sales Dataset

Implemented as a Part of the Subject: Business Analytics

Subject Code: 3PGDM02

Implemented By: Pankaj Soni

Roll Number: PGDM2022-199

Branch and Section Details: Post Graduate Diploma in Management (Dual


Specialization), Alpha Batch – 2022-24, Narayana Business School

Instructed and Guided By: Mr. Tushar Kakaiya

Designation: Professor of Practice

Department: Information Technology and Analytics

Institute Name: Narayana Business School


Project Description:
The project is about implementation of R Scripts which can help to analyse the given dataset
of the “Walmart Store Sales”.

The project can help to build Machine Learning Models in future and forecast the sales by data
scientists based on the breakdown done by the Data Analysts in this Project.

Dataset Information:

We are provided with historical sales data for 45 Walmart stores located in different regions.
Each store contains a number of departments, and you are tasked with analysing the
department-wide sales for each store.

In addition, Walmart runs several promotional markdown events throughout the year. These
markdowns precede prominent holidays, the four largest of which are the Super Bowl, Labor
Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times
higher in the evaluation than non-holiday weeks.

Source file(s) Information:

We have been given 2 datasets in the form of a CSV (Comma Separated Value) file which are
described below

walmart_data.csv:
This is the historical training data, which covers to 2010-02-05 to 2012-11-01. Within this file
you will find the following fields:

• Store - the store number


• Dept - the department number
• Date - the week
• Weekly Sales - sales for the given department in the given store
• Is Holiday - whether the week is a special holiday week

walmart_features.csv:

This file contains additional data related to the store, department, and regional activity for the
given dates. It contains the following fields:

• Store - the store number


• Date - the week
• Temperature - average temperature in the region
• Fuel_Price - cost of fuel in the region
• MarkDown1-5 - anonymized data related to promotional markdowns that Walmart is
running. MarkDown data is only available after Nov 2011, and is not available for all
stores all the time. Any missing value is marked with an NA.
• CPI - the consumer price index
• Unemployment - the unemployment rate
• Is Holiday - whether the week is a special holiday week

For convenience, the four holidays fall within the following weeks in the dataset (not all
holidays are in the data):

• Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13


• Labour Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13
• Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13
• Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13
Problem Statements for the Analysis:

NOTE: You should be writing R Scripts to achieve the solution to the below Problem
Statements

1. Create your working directory and put the given files in the data folder inside your
working directory, Use R to set working directory and use Relative Paths to load the
CSV files kept under the data folder in the walmart_data and walmart_features Objects

2. How many Records you have for the weekly sales and feature dataset?

3. Try to summarise the data for the weekly sales dataset (Your output should print the
columns of the dataset along with the type of data it stores and also summarise the
statistics(Mean, Min, Median, Max) about any numeric column if it is having

4. Try to have a bird eye view of the weekly sales data so that you don’t have to scan the
entire dataset at once (Tip: Display the first 5-10 records only!)

5. How many sales records we have for each stores and store it in rows_per_store variable
(Hint: Use Table( ) )

6. Convert rows_per_store in the dataframe of 2 columns of store number and their record
counts

7. Which store has reported the maximum number of weekly sales counts?

8. Sum the sales by store on walmart_data and store the data in a sum_by_store dataframe
with the columns name as store_number and total_sales
9. Plot bar plot using base r with sorting total sales from sales with most sales to stores
least sales. Give a title to the plot "Sales By Store" and color the bars darkgreen. Note:
Ignore the x labels for now

10. Compute the mean of every column in walmart_features

11. Create a new column in walmart features called standardized_cpi subtracting the mean
and dividing by the standard deviation (Note: CPI has NAs!)

12. Produce a line plot for sales of store number 1 for every department Add labels to x and
y (Hint: Check the function lines!)

13. Use GGPlot to plot the total sales per week for store 20. (Note: Add points to your plot,
use as.Date on the x value in aes to format the x-axis labels)

14. Plot the sales for the top 5 departments with more sales for store 2 with ggplot2 with
different colors Plot a line per department (Hint: Use the Group on aes to plot different
time series)

15. In the graph above, convert the Dept to a factor and the date to a date type column

16. Open question (no need to code, just write down your answer with a justification line)
- Which department shows a more seasonal trend?

You might also like