0% found this document useful (0 votes)

10 views5 pages

Case Study

This R-based case study analyzes the Online Retail Dataset, focusing on customer behavior, product sales, and revenue. It includes data transformation and visualization techniques using libraries like ggplot2 and dplyr, performing analyses such as revenue calculation, hypothesis testing, and correlation analysis. Key findings include identifying top products by revenue and revenue distribution across countries, along with statistical tests to evaluate differences and relationships in the data.

Uploaded by

rutvik waghmare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views5 pages

Case Study

Uploaded by

rutvik waghmare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Create an R-based case study that demonstrates data analysis, transformation, manipulation and

visualization techniques using a sample dataset. or

Create an R-based case study by using R code to analyze the Online Retail Dataset using a sample
dataset includes fields such as InvoiceNo, StockCode, Description, Quantity, InvoiceDate,
UnitPrice, CustomerID, and Country.

R-Code:

Step 1: Objective

The Online Retail Dataset contains information about online transactions, including invoice details,
product information, customer IDs, and country of origin. We'll use R to analyze this data and gain
insights into customer behavior, product sales, and revenue.

Step 2: Dataset

The dataset contains the following fields:

 InvoiceNo: Unique invoice number

 StockCode: Product code

 Description: Product description

 Quantity: Number of units sold

 InvoiceDate: Date of invoice

 UnitPrice: Price per unit

 CustomerID: Unique customer ID

 Country: Country of origin

Step 3: R-Code

# Load necessary libraries

install.packages("ggplot2")
library(ggplot2)
install.packages("dplyr")
library(dplyr)
install.packages("lubridate")
library(lubridate)
# Load the dataset

library(readr)
retail_data <- read_csv("C:/Users/DELL/Desktop/online_retail.csv")
View(retail_data)

# Explore the data

summary(retail_data)

# Convert InvoiceDate to date format

retail_data$InvoiceDate <- ymd_hms(retail_data$InvoiceDate)

retail_data$InvoiceDate

#Missing Value
Mean_Quantity = mean(retail_data$Quantity,na.rm =TRUE)
Mean_Quantity
retail_data$Quantity=ifelse(is.na(retail_data$Quantity ), Mean_Quantity , retail_data$Quantity)
retail_data$Quantity

# Calculate total revenue

retail_data$Revenue <- retail_data$Quantity * retail_data$UnitPrice

retail_data$Revenue

#OR

retail_data <- retail_data %>%

mutate(Revenue = Quantity * UnitPrice)
View(retail_data)

# Top 10 products by revenue

top_products <- retail_data %>%

group_by(Description) %>%
summarise(TotalRevenue = sum(Revenue)) %>%
arrange(desc(TotalRevenue)) %>%
head(10)
top_products

# Visualize top products

# Load ggplot2
library(ggplot2)
ggplot(top_products, aes(x = reorder(Description, TotalRevenue), y = TotalRevenue)) +
geom_col() +
xlab("Product") +
ylab("Revenue") +
ggtitle("Top 10 Products by Revenue")

Reorders the product names based on revenue so that bars appear in ascending/descending order.
# Revenue by country

Revenue_by_country <- retail_data %>%

group_by(Country) %>%
summarise(TotalRevenue = sum(Revenue)) %>%
arrange(desc(TotalRevenue))
Revenue_by_country

# Visualize Revenue by country

ggplot(Revenue_by_country, aes(x = reorder(Country, TotalRevenue), y = TotalRevenue)) +

geom_col() +
labs(title = "Revenue by Country", x = "Country", y = "Revenue")

Reorders the product names based on revenue so that bars appear in ascending/descending order.

#Descriptive Statistics

Summary(retail_data$UnitPrice)
Summary(retail_data$Revenue)

#Testing of Hypothesis

# 1) Two sample t test

#Null Hypothesis (H0): (μ1 = μ2) i.e.There is no significant difference in Revenue of United Kingdom
and Australia.
#Alternative Hypothesis (H1) :(μ1 ≠ μ2)i.e. There is a significant difference in Revenue of United
Kingdom and Australia.

# Filter data for two countries

country1_data <- retail_data %>% filter(Country == "United Kingdom")
country1_data
country2_data <- retail_data %>% filter(Country == "Australia")
country2_data

# Perform two-sample t-test

t_test_result <- t.test(country1_data$Revenue, country2_data$Revenue,var.equal=TRUE)
t_test_result

Decision to reject and fail to reject the H0

If P-Value > α, fail to reject the H0

If P-Value < α, reject the H0

# 2) ANOVA
#H0: Revenue do not vary significantly across different countries.
#H1: Revenue vary significantly across different countries.

# Perform ANOVA
anova_result <- aov(Revenue ~ Country, data = retail_data)
anova_result
summary(anova_result)

# Perform Tukey's HSD test

tukey_result <- TukeyHSD(anova_result)
tukey_result

#Decision to reject and fail to reject the H0

#If P-Value > α, fail to reject the H0

#If P-Value < α, reject the H0

#3) Correlational Analysis

#Null Hypothesis (H0): There's no significant relationship between Revenue and quantity (ρ = 0).
#Alternative Hypothesis (H1): There's a significant relationship between Revenue and quantity (ρ ≠
0).

# Perform correlation analysis

correlation_result <- cor.test(retail_data$ Revenue, retail_data$Quantity)
correlation_result

# Visualize the relationship

ggplot(retail_data, aes(x = Quantity, y = Revenue)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Relationship between Revenue and Quantity", x = "Quantity", y = " Revenue ")

 se = FALSE: Hides the shaded confidence interval around the line.

# Fit the linear regression model

model <- lm(Revenue ~ Quantity, data = retail_data)
summary(model)

Hindus Worship Ritual Vol 1
100% (3)
Hindus Worship Ritual Vol 1
45 pages
Notes Viz
100% (1)
Notes Viz
79 pages
Detailed Advertisement For Faculty Recruitment 19.07
No ratings yet
Detailed Advertisement For Faculty Recruitment 19.07
8 pages
Case Study
50% (2)
Case Study
8 pages
Report For Task2
No ratings yet
Report For Task2
23 pages
Report For Task2
No ratings yet
Report For Task2
23 pages
Unit 3-5 15 Marks
No ratings yet
Unit 3-5 15 Marks
8 pages
Practical Session 2
No ratings yet
Practical Session 2
6 pages
Day 5 Session 2 Visualization Ii
No ratings yet
Day 5 Session 2 Visualization Ii
33 pages
DVT (Lab) - R Language Manual
No ratings yet
DVT (Lab) - R Language Manual
20 pages
The Global Burden of Disease
No ratings yet
The Global Burden of Disease
17 pages
Data Science Programming Lab Assessment-6: Importing The Packages and Loading The Dataset
No ratings yet
Data Science Programming Lab Assessment-6: Importing The Packages and Loading The Dataset
16 pages
ITEC-2600 Assignment
No ratings yet
ITEC-2600 Assignment
6 pages
20BCE1205 Lab5
No ratings yet
20BCE1205 Lab5
8 pages
Question No1
No ratings yet
Question No1
6 pages
Natural and Artificial Tracers in Ground Water
100% (1)
Natural and Artificial Tracers in Ground Water
23 pages
Chapman Feit R For Marketing Research Book Talk
No ratings yet
Chapman Feit R For Marketing Research Book Talk
30 pages
02.session-Notes-1 and 2-Basic Data Analysis
No ratings yet
02.session-Notes-1 and 2-Basic Data Analysis
11 pages
Lesson3 Aesthetics
No ratings yet
Lesson3 Aesthetics
3 pages
Lesson2 GGPlot
No ratings yet
Lesson2 GGPlot
3 pages
Seminar - 1 2
No ratings yet
Seminar - 1 2
14 pages
Commands For Data Analysis Using R
No ratings yet
Commands For Data Analysis Using R
11 pages
All Codes
No ratings yet
All Codes
10 pages
Big Data Analytics
No ratings yet
Big Data Analytics
13 pages
DS Exp4
No ratings yet
DS Exp4
4 pages
Visualizing Big Data With Trelliscope
No ratings yet
Visualizing Big Data With Trelliscope
7 pages
Puppets and Therapy
No ratings yet
Puppets and Therapy
6 pages
Land Use and Transport
No ratings yet
Land Use and Transport
15 pages
ITEC-2600 Assignment
No ratings yet
ITEC-2600 Assignment
6 pages
R
No ratings yet
R
14 pages
ANZ Virtual Internship Module Model Answer For Task 1
No ratings yet
ANZ Virtual Internship Module Model Answer For Task 1
7 pages
Data Visualization
No ratings yet
Data Visualization
30 pages
M.DIRAHMAN Report2EDA
No ratings yet
M.DIRAHMAN Report2EDA
53 pages
R Analysis Summary
No ratings yet
R Analysis Summary
6 pages
R Tools Manual New
No ratings yet
R Tools Manual New
35 pages
BT1101 Tutorial 3 Part 2
No ratings yet
BT1101 Tutorial 3 Part 2
38 pages
Task 1
No ratings yet
Task 1
6 pages
Assignment (4) .Module RAmanVerma (22MBA10026)
No ratings yet
Assignment (4) .Module RAmanVerma (22MBA10026)
18 pages
Pracal Labexamsamplequestions
No ratings yet
Pracal Labexamsamplequestions
35 pages
Chapter 4 Numerical Differentiation and Integration
No ratings yet
Chapter 4 Numerical Differentiation and Integration
110 pages
Consumer Spending Behavior Based On Different Categories - 5380
No ratings yet
Consumer Spending Behavior Based On Different Categories - 5380
3 pages
Market Basket Analysis Using: R Tool
No ratings yet
Market Basket Analysis Using: R Tool
23 pages
Frequency Distributions
No ratings yet
Frequency Distributions
28 pages
Prem Visualanalytics Individual Assignment
No ratings yet
Prem Visualanalytics Individual Assignment
4 pages
Excel and R Integration
No ratings yet
Excel and R Integration
20 pages
Bda Skill
No ratings yet
Bda Skill
34 pages
Final Ca
No ratings yet
Final Ca
10 pages
SMEFinal
No ratings yet
SMEFinal
32 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
ANZ Virtual Internship Module Model Answer For Task 1
No ratings yet
ANZ Virtual Internship Module Model Answer For Task 1
9 pages
MKT4080-Codes
No ratings yet
MKT4080-Codes
9 pages
OMEGA AIR - Process and Sterile Filtration - English
No ratings yet
OMEGA AIR - Process and Sterile Filtration - English
12 pages
06 Superstore
No ratings yet
06 Superstore
14 pages
R Regression Exercise 2019
No ratings yet
R Regression Exercise 2019
9 pages
Lab 1
No ratings yet
Lab 1
1 page
Lesson3 Sandbox - RMD
No ratings yet
Lesson3 Sandbox - RMD
4 pages
R Programming
No ratings yet
R Programming
11 pages
RAMESH
No ratings yet
RAMESH
10 pages
Digital Assignment-6: Read The Data
No ratings yet
Digital Assignment-6: Read The Data
30 pages
Assignment
No ratings yet
Assignment
2 pages
Module2 BDA
No ratings yet
Module2 BDA
44 pages
Human Resource Planning (H R P)
No ratings yet
Human Resource Planning (H R P)
31 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
Coding Introduction
No ratings yet
Coding Introduction
46 pages
Project - Retail Analysis With Walmart Data
No ratings yet
Project - Retail Analysis With Walmart Data
5 pages
Name of The Guide: Educational Qualification
No ratings yet
Name of The Guide: Educational Qualification
2 pages
R Programs 2024-2025
No ratings yet
R Programs 2024-2025
13 pages
Group 13 - Term Project
No ratings yet
Group 13 - Term Project
18 pages
Explore and Transform Data Based On Rows - Transcript
No ratings yet
Explore and Transform Data Based On Rows - Transcript
3 pages
Praktikum Modul 3
No ratings yet
Praktikum Modul 3
5 pages
RGU Academic Calendar 2015-2016
No ratings yet
RGU Academic Calendar 2015-2016
1 page
HET 227 - Morphology & Syntax - The Mental Lexicon
No ratings yet
HET 227 - Morphology & Syntax - The Mental Lexicon
2 pages
Watson 1999 Liberal Communitarianism As Political Theory
No ratings yet
Watson 1999 Liberal Communitarianism As Political Theory
8 pages
English Grammar For Enginners
No ratings yet
English Grammar For Enginners
177 pages
International MKT Case Study 2 IKEA
No ratings yet
International MKT Case Study 2 IKEA
3 pages
A-heavy-metal-tolerant-novel-bacterium,-Bacillus-malikii-sp.-nov.,-isolated-from-tannery-effluent-wastewater_2015_Antonie-van-Leeuwenhoek,-International-Journal-of-General-and-Molecular-Microbiology.pdf
No ratings yet
A-heavy-metal-tolerant-novel-bacterium,-Bacillus-malikii-sp.-nov.,-isolated-from-tannery-effluent-wastewater_2015_Antonie-van-Leeuwenhoek,-International-Journal-of-General-and-Molecular-Microbiology.pdf
12 pages
Merge 1
No ratings yet
Merge 1
205 pages
Theories (Theory) of Outdoor & Adventure Education
No ratings yet
Theories (Theory) of Outdoor & Adventure Education
8 pages
Six Strategies For Effective Learning Bookmarks: Interleaving Interleaving Interleaving Interleaving
No ratings yet
Six Strategies For Effective Learning Bookmarks: Interleaving Interleaving Interleaving Interleaving
1 page
Hackathon Presentation-Online
No ratings yet
Hackathon Presentation-Online
14 pages
CEng 6104-Course Outline March 2023
No ratings yet
CEng 6104-Course Outline March 2023
2 pages
Dynamic Binding
No ratings yet
Dynamic Binding
11 pages
Resume Name: Take Shailesh Sanjay E-Mail: Professional Objective
No ratings yet
Resume Name: Take Shailesh Sanjay E-Mail: Professional Objective
3 pages
Experiment-1 Study of Pspice
No ratings yet
Experiment-1 Study of Pspice
2 pages
Ch.2 CRD
No ratings yet
Ch.2 CRD
10 pages
Alcatel 2801 Mainstreet Dtu: HDSL Data Termination Unit - Release 2.0
No ratings yet
Alcatel 2801 Mainstreet Dtu: HDSL Data Termination Unit - Release 2.0
2 pages
Hieronymus
No ratings yet
Hieronymus
4 pages
Ravi Ranjan - RESULTS
No ratings yet
Ravi Ranjan - RESULTS
2 pages
关于会计抽样的方法相关论文reference
No ratings yet
关于会计抽样的方法相关论文reference
4 pages
Web Search Portals World Summary: Market Values & Financials by Country
From Everand
Web Search Portals World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet