0% found this document useful (0 votes)

145 views29 pages

SMDM Project

The document provides an analysis of data from 440 large retailers on their annual spending across 6 product varieties in 3 Portuguese regions and 2 sales channels. Key findings from the analysis include: - The region that spent the most was "Other" and the region that spent the least was "Oporto". The channel that spent the most was "Hotel" and the channel that spent the least was "Retail". - Across regions and channels, spending was highest on "Fresh" products and lowest on "Delicatessen". - "Fresh products" showed the most inconsistent behavior across regions and channels, while "Delicatessen" showed the least inconsistent behavior. - All 6 product varieties showed

Uploaded by

crispin anthony

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

145 views29 pages

SMDM Project

Uploaded by

crispin anthony

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

SMDM PROJECT

Business Report On Wholesale Customers Analysis

CONTEXT:
A wholesale distributor operating in different regions of Portugal has information on the
annual spending of several items in their stores across different regions and channels. The
data consists of 440 large retailers’ annual spending on 6 different varieties of products in 3
different regions (Lisbon, Oporto, Other) and across different sales channels (Hotel, Retail).

1.1.1 Use methods of descriptive statistics to summarize data.

Which Region and which Channel spent the most?
Which Region and which Channel spent the least?

Descriptive statistics is concerned with Data Summarization Graphs/Charts and tables. The methods of
descriptive statics include Distribution, which deals with each value's frequency, Measures of Central
Tendency and Measures of variability. The most widely used measures of central tendency is Arithmetic
Mean, Median, and Mode.
Mean is defined as the arithmetic average of all observations in the data set.
Median is defined as the middle value in the data set arranged in ascending or descending order.
M o d e is defined as the most frequently occurring value in the distribution; it has the largest frequency.

Measures of Dispersion include Range, IQR, Standard Deviation

R a n g e is the simplest of all measures of dispersion. It is calculated as the difference between maximum
and minimum value in the data set.
Inter-Quartile Range (IQR) is computed on middle 50% of the observations after eliminating the
highest and lowest 25% of observations in a data set that is arranged in ascending order. IQR is less affected
by outliers.
S t a n d a r d d e v i a t i o n is the square root of variance in simple words.

The table below shows the description of the Wholesale

customer dataset:
In t h e t a b l e b e l o w w e c a n s e e s o m e s a m p l e r e c o r d s w h i c h h a s 2
categorical variable and 7 numerical variables.

The data consists of 440 large retailers’ annual spending on 6 different v a r i e t i e s o f

products in 3 different regions (Lisbon, Oporto, Other) and
a c r o s s d i f f e r e n t s a l e s channel (Hotel, Retail)
Th e R e g i o n t h a t h a s s p e n t t h e m o s t i s Other (10677599) a n d t h e
r e g i o n t h a t h a s s p e n t t h e l e a s t i s Oport o (1555088).
T h e C h a n n e l t h a t h a s s p e n t t h e m o s t i s Hotel (7999569) a n d t h e
c h a n n e l t h a t h a s s p e n t t h e l e a s t i s Retail (6619931).

1.2. There are 6 different varieties of items that are considered. Describe and
comment/explain all the varieties across Region and Channel? Provide a
detailed justification for your answer.

When we sum up the spending across each channel and region,

we get the total spending across each channel and region in
the following table. The 6 different varieties of items which
include Fresh, Milk, grocery, frozen, detergent paper,
delicatessen spending can be further summarized in the bar
graph.
From the above graph, we can see that at Lisbon most spent
product are Fresh products and the least spent product
is Delicatessen. At Oporto, the most spent product are Fresh
products and least spent products are Delicatessen. In other
category, the most spent product are Fresh products and least
spent product are Delicatessen.

The above graph clearly shows that the most spent product in
retail category is Grocery products and least spent product in
retail category is the Frozen food products. In Hotel category
the most spent product is the Fresh products and least spent
p r o d u c t i s t h e Detergents paper.
1.3 On the basis of the descriptive measure of variability, which item shows
the most inconsistent behaviour? Which items shows the least inconsistent
behaviour?

The common descriptive measures of variability are the range,

IQR, variance, and standard deviation. To check the
inconsistent behaviour of an item we can calculate the
coefficient of variation of each of the variable. The following
pie chart explains how each of the item has performed across
the 3 different locations Lisbon, Oporto and other against both
retail and hotel category.

Th i s t a b l e s h o w s t h a t c o e f f i c i e n t o f v a r i a n c e o f F r e s h
p r o d u c t s i s 1 0 5 . 2 5 % w h i l e t h a t o f Delicatessen is 184.42%.
Therefore, Fres h p r o d u c t s s h o w t h e m o s t i n c o n s i s t e n t
b e h a v i o u r a n d Delicatessen s h o w s t h e l e a s t i n c o n s i s t e n t
behaviour.
This pair plot helps us to understand the relationship between
the 6 food item.
1.4 Are there any outliers in the data? Back up your answer with a suitable
plot/technique with the help of detailed comments.

From the boxplot below, we can clearly see that all the six items have outliers.

Outliers are observations in a dataset that don’t fit in some way. Perhaps the most common
or f a m i l i a r t y p e o f o u t l i e r i s t h e o b s e r v a t i o n s t h a t a r e f a r f r o m
the rest of the observations or the centre of mass of
observations. Outliers can skew statistical measures and
data distributions, providing a misleading representation of
the underlying data and relationships. Removing outliers from
data prior to modelling can result in a better fit of the
data and, in turn, more s k i l f u l p r e d i c t i o n .
1.5 On the basis of your analysis, what are your recommendations for the
business? How can your analysis help the business to solve its problem?
Answer from the business perspective.
From this analysis we can conclude that:
(1) When we calculate total, the business spends the most
on fresh products across different channels and different
regions, so the company needs to ensure that it is driving the
most p r o f i t f r o m t h i s f o o d i t e m .
(2) Since the Delicatessen show the least inconsistent
behaviour, the business should invest m o r e i n t h i s f o o d i t e m
because it is less risky.
(3) Fresh products require more spending, to cut it’s cost, the
wholesale distributor can concentrate more on other food
items like Milk, Grocery, Frozen, Detergents paper and
Delicatesse n.
BUSINESS REPORT ON EDUCATION – POST 12TH STANDARD

CONTEXT:
The dataset Education - Post 12th Standard.csv contains information on various colleges.
You are expected to do a Principal Component Analysis for this case study according to the
instructions given.

2.1 Perform Exploratory Data Analysis [Univariate, Bivariate, and

Multivariate analysis to be performed]. What insight do you draw from the
EDA?
We will explore the Data set and perform the exploratory data analysis on the dataset.
The major topics to be covered are below:

Removing duplicates

Missing value treatment

Outlier Treatment

Normalization and Scaling (Numerical Variables)

Encoding Categorical variables (Dummy Variables)

Univariate Analysis

Bivariate Analysis

As a first step, we will import all the necessary libraries that we think we will be requiring to
perform the EDA.

In this step, we will perform the below operations to check what the data set comprises of.
We will check the below things:

Head of the dataset

Shape of the dataset

Info of the dataset

Summary of the dataset

The table below shows the description of the dataset:

In t h e t a b l e b e l o w w e c a n s e e s o m e s a m p l e r e c o r d s w h i c h h a s 1
categorical variable and 17 numerical variables.

The data consists of 777 student’s entries on 18 different p a r a m e t e r s , s u c h a s :

Names: Names of various university and colleges
Apps: Number of applications received
Accept: Number of applications accepted
Enroll: Number of new students enrolled
Top10perc: Percentage of new students from top 10% of Higher Secondary class
Top25perc: Percentage of new students from top 25% of Higher Secondary class
F.Undergrad: Number of full-time undergraduate students
P.Undergrad: Number of part-time undergraduate students
Outstate: Number of students for whom the particular college or university is Out-of-state
tuition
Room.Board: Cost of Room and board
Books: Estimated book costs for a student
Personal: Estimated personal spending for a student
PhD: Percentage of faculties with Ph.D.’s
Terminal: Percentage of faculties with terminal degree
S.F.Ratio: Student/faculty ratio
perc.alumni: Percentage of alumni who donate
Expend: The Instructional expenditure per student
Grad.Rate: Graduation rate

The below data describes the top five dataset.

This below data describes the last five dataset.

This above information is used to check information about the data and the datatypes of each
respective attributes.
The above data describes the method which will help to see how data has been spread for
the numerical values. We can clearly see the minimum value, mean values, different
percentile values and maximum values.
Outliers are those that go beyond the maximum of a certain data.
In the graph below we see outliers present in those datas.
Looking at the box plots above, it seems that the 16 variables ie. Apps, Accept, Enroll,
Top10perc, F.Undergrad, P.Undergrad, Outstate, Room.Board, Books, Personal, PhD, Terminal,
S.F.Ratio, perc.alumni, Expend, Grad.Rate have outliers present. Accordingly have treated the
outliers too.
There are no duplicate records in the dataset.

There are no missing values in the dataset as you can clearly see below.
In the above graphs, have taken visual presentation of each variable. So that can understand the
data much better.
UNIVARIATE ANALYSIS

In the above graph it shows the apps variable we can say that the Apps parameter is right and
left skewed.
MULTIVARIATE ANALYSIS

In the above plot scatter diagrams are plotted for all the numerical columns in the dataset.
A scatter plot is a visual representation of the degree of correlation between any two columns.
The pair plot function in seaborn makes it very easy to generate joint scatter plots for all the
columns in the data.

PCA (Principal Component Analysis)

As per the question did a principal component analysis on the given data.
The below plot shows the correlation between the variables in the dataset.
0 to 0.35 is “Weak”
0.35 to 0.8/0.85 is “Moderate”
Greater than 0.8/0.85 is “Strong”
In the above plot we can check and see which variables have Weak, Moderate or Strong
correlation with each other.
According to the this correlation we can analyze various aspects of the dataset.
Have changed the columns to rows and got down the number of columns to 12.

The above graph is a visual presentation of all the 12 principal components.

Have changed the columns to rows and got down the number of columns to 5.
Check as to how the original features matter to each principal component. Here we are only
considering the absolute values.
Compare how the original features influence various principal Components.
We need the original scaled features
Check for presence of correlations among the principal components

A Wholesale Distributor
83% (6)
A Wholesale Distributor
5 pages
FASA - Federation Ship Recognition Manual 2385
100% (4)
FASA - Federation Ship Recognition Manual 2385
204 pages
Business Report
No ratings yet
Business Report
12 pages
P-WPS 135 - MAG (GR 316)
No ratings yet
P-WPS 135 - MAG (GR 316)
9 pages
RedLine Manual
No ratings yet
RedLine Manual
2 pages
SMDM Project
100% (1)
SMDM Project
19 pages
GM 8
No ratings yet
GM 8
1 page
Business Report Project SMDM Sonali Pradhan
100% (1)
Business Report Project SMDM Sonali Pradhan
56 pages
A Wholesale Distributor
100% (3)
A Wholesale Distributor
5 pages
Business Report: Statistical Methods For Decision Making Project PGP-DSBA Online Athisya Nadar 9 May 2021
89% (9)
Business Report: Statistical Methods For Decision Making Project PGP-DSBA Online Athisya Nadar 9 May 2021
26 pages
MRA Project Milestone 2
71% (17)
MRA Project Milestone 2
20 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
A Wholesale Distributor 1
100% (1)
A Wholesale Distributor 1
5 pages
Data Mining Project PCA Report
100% (1)
Data Mining Project PCA Report
27 pages
Writing Preliminary Pages
No ratings yet
Writing Preliminary Pages
10 pages
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
No ratings yet
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
28 pages
Design of Marine Propulsion Shafting System For 53000 DWT Bulk Carrier
67% (3)
Design of Marine Propulsion Shafting System For 53000 DWT Bulk Carrier
10 pages
Howards2016 17 Mid PDF
No ratings yet
Howards2016 17 Mid PDF
360 pages
Bettermaker EQ232D Manual
No ratings yet
Bettermaker EQ232D Manual
6 pages
Statistics of Cattle, Calves, Beef, Veal, Hides, and Skins, (USDA 1927)
No ratings yet
Statistics of Cattle, Calves, Beef, Veal, Hides, and Skins, (USDA 1927)
316 pages
Sahil Sehgal's E-Book
No ratings yet
Sahil Sehgal's E-Book
10 pages
परियोजना कार्य कक्षा ६
No ratings yet
परियोजना कार्य कक्षा ६
69 pages
Building and Installing The USRP Open-Source Toolchain (UHD and GNU Radio) On Linux PDF
No ratings yet
Building and Installing The USRP Open-Source Toolchain (UHD and GNU Radio) On Linux PDF
5 pages
SMDM Project by Shivani Pandey DSBA
No ratings yet
SMDM Project by Shivani Pandey DSBA
24 pages
SMDM Project
0% (1)
SMDM Project
22 pages
Business Decision Making
No ratings yet
Business Decision Making
28 pages
FactSheet - QoS v1
No ratings yet
FactSheet - QoS v1
4 pages
Optimization of Vane Diffuser in A Mixed-Flow Pump For High Efficiency Design
No ratings yet
Optimization of Vane Diffuser in A Mixed-Flow Pump For High Efficiency Design
7 pages
Photovoltaic Plants ABB-3
No ratings yet
Photovoltaic Plants ABB-3
20 pages
For Cinema, Television and Photography: Light & Shadow
No ratings yet
For Cinema, Television and Photography: Light & Shadow
7 pages
AC INFORMATION SYSTEMS Section 020 Fall Semester 2013 CO: My Home
No ratings yet
AC INFORMATION SYSTEMS Section 020 Fall Semester 2013 CO: My Home
12 pages
Jb-Wd-Dse 6110 Mkii - 200 (1506) - 650
No ratings yet
Jb-Wd-Dse 6110 Mkii - 200 (1506) - 650
2 pages
Book Chapter ZimmermannFuzzySetTheory2001
No ratings yet
Book Chapter ZimmermannFuzzySetTheory2001
14 pages
Group 3 SIDM Project
No ratings yet
Group 3 SIDM Project
23 pages
Documents
No ratings yet
Documents
8 pages
РЛС ICE RADAR FICE-100 Мануал1
No ratings yet
РЛС ICE RADAR FICE-100 Мануал1
20 pages
Link de Mis Clases - 6B A Partir de Julio
No ratings yet
Link de Mis Clases - 6B A Partir de Julio
3 pages
VMW Ebook Vmware Vsphere Eight
No ratings yet
VMW Ebook Vmware Vsphere Eight
11 pages
Data Mining Project
No ratings yet
Data Mining Project
6 pages
(SS Handouts 5&6) Dec 13
No ratings yet
(SS Handouts 5&6) Dec 13
3 pages
Group 24 Business Analytics
100% (1)
Group 24 Business Analytics
21 pages
BigMart PDF
100% (1)
BigMart PDF
42 pages
Ashishpk 12-09 21
No ratings yet
Ashishpk 12-09 21
21 pages
SMDM Project (C. Soba)
No ratings yet
SMDM Project (C. Soba)
14 pages
Britannia Industry Introduction
0% (1)
Britannia Industry Introduction
7 pages
Marketing and Retail Analytics - Assignment1
100% (1)
Marketing and Retail Analytics - Assignment1
24 pages
EDA On Gala Groceries Sample Sales Data
No ratings yet
EDA On Gala Groceries Sample Sales Data
1 page
Performance Analysis of Consumer Goods Sector
No ratings yet
Performance Analysis of Consumer Goods Sector
13 pages
STAY by The Kid LAROI, Justin Bieber Piano Letter Notes
No ratings yet
STAY by The Kid LAROI, Justin Bieber Piano Letter Notes
1 page
Extended Project - Wholesales Customer
No ratings yet
Extended Project - Wholesales Customer
27 pages
Business Analytics Report: Submitted To
No ratings yet
Business Analytics Report: Submitted To
32 pages
ACE Module 5 v2.0
No ratings yet
ACE Module 5 v2.0
38 pages
Great Learning: SMDM Final Assignment
100% (1)
Great Learning: SMDM Final Assignment
16 pages
Imbuido James MA5821 Ax2
No ratings yet
Imbuido James MA5821 Ax2
20 pages
Package Aware R
No ratings yet
Package Aware R
98 pages
Statistical Methods For Decision Making
100% (1)
Statistical Methods For Decision Making
15 pages
Capstone-2 Market Basket Analysis Vinothkumar R
No ratings yet
Capstone-2 Market Basket Analysis Vinothkumar R
18 pages
5 6069159401975976243
No ratings yet
5 6069159401975976243
23 pages
Ch1 Evans BA1e
No ratings yet
Ch1 Evans BA1e
24 pages
Business Report Project - SMDM Group 10 16-March-2020
No ratings yet
Business Report Project - SMDM Group 10 16-March-2020
12 pages
SMDM Report
No ratings yet
SMDM Report
12 pages
Dinya Antony MRA ML2
100% (1)
Dinya Antony MRA ML2
24 pages
05 Ncecsc Dgarm
No ratings yet
05 Ncecsc Dgarm
9 pages
SMDM Project Report Dipti
No ratings yet
SMDM Project Report Dipti
14 pages
SMDM Project Report
100% (1)
SMDM Project Report
9 pages
Prathamesh Shukla SMDM Project 20.08.23
100% (1)
Prathamesh Shukla SMDM Project 20.08.23
34 pages
Priyanka Mandal - Product
No ratings yet
Priyanka Mandal - Product
1 page
Grocery Store - MRA - Priyanka Sharma
No ratings yet
Grocery Store - MRA - Priyanka Sharma
24 pages
Python - ETE
No ratings yet
Python - ETE
19 pages
Analysis Report-5 - 105839
No ratings yet
Analysis Report-5 - 105839
20 pages
FOOD WASTE IN BROOKLYN (28 X 40 In)
No ratings yet
FOOD WASTE IN BROOKLYN (28 X 40 In)
1 page
AFM - Financial Analysis
No ratings yet
AFM - Financial Analysis
33 pages
Market Basket Analysis Using: R Tool
No ratings yet
Market Basket Analysis Using: R Tool
23 pages
Session 2
No ratings yet
Session 2
36 pages
Ordinary Level Cala Component D 2023
No ratings yet
Ordinary Level Cala Component D 2023
6 pages
Packing Curing and Merchandising America
No ratings yet
Packing Curing and Merchandising America
17 pages
Greenheck - Effects of Screens On Louver Performance
No ratings yet
Greenheck - Effects of Screens On Louver Performance
1 page
The Factors Affecting Big Mart's Sales
No ratings yet
The Factors Affecting Big Mart's Sales
20 pages
SMDM Project
No ratings yet
SMDM Project
6 pages
MRA1
No ratings yet
MRA1
43 pages
Steps in Setting Up Business On Internet
No ratings yet
Steps in Setting Up Business On Internet
7 pages
Market and Retail Analysis Presentation-Compressed-Compressed
No ratings yet
Market and Retail Analysis Presentation-Compressed-Compressed
23 pages
Business Report. - 10octdocx
No ratings yet
Business Report. - 10octdocx
32 pages
Midterm
No ratings yet
Midterm
11 pages
Solution
No ratings yet
Solution
4 pages
Case Study-1-Pattern Discovery in Supermarket Sales Transactions Using EDA
No ratings yet
Case Study-1-Pattern Discovery in Supermarket Sales Transactions Using EDA
3 pages
21f1000089 BDM Proposal
No ratings yet
21f1000089 BDM Proposal
8 pages
Business Statistics For Dummies
From Everand
Business Statistics For Dummies
Alan Anderson
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Research & Development in Social Sciences & Humanities Revenues World Summary: Market Values & Financials by Country
From Everand
Research & Development in Social Sciences & Humanities Revenues World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
Oscilloscopes World Summary: Market Values & Financials by Country
From Everand
Oscilloscopes World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
Pollution Monitoring World Summary: Market Values & Financials by Country
From Everand
Pollution Monitoring World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Statistical Analysis and Decision Making Using Microsoft Excel
From Everand
Statistical Analysis and Decision Making Using Microsoft Excel
Grace Edmar Elizar del Prado
5/5 (1)

SMDM Project

Uploaded by

SMDM Project

Uploaded by

SMDM PROJECT

Business Report On Wholesale Customers Analysis

1.1.1 Use methods of descriptive statistics to summarize data.

Measures of Dispersion include Range, IQR, Standard Deviation

The table below shows the description of the Wholesale

The data consists of 440 large retailers’ annual spending on 6 different v a r i e t i e s o f

When we sum up the spending across each channel and region,

The common descriptive measures of variability are the range,

2.1 Perform Exploratory Data Analysis [Univariate, Bivariate, and

Missing value treatment

Normalization and Scaling (Numerical Variables)

Encoding Categorical variables (Dummy Variables)

Head of the dataset

Shape of the dataset

Info of the dataset

Summary of the dataset

The data consists of 777 student’s entries on 18 different p a r a m e t e r s , s u c h a s :

The below data describes the top five dataset.

PCA (Principal Component Analysis)

The above graph is a visual presentation of all the 12 principal components.

You might also like