0% found this document useful (0 votes)

93 views25 pages

Case Study Data Analytics Bicycle

Uploaded by

Eduardo Jorge pinto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views25 pages

Case Study Data Analytics Bicycle

Uploaded by

Eduardo Jorge pinto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Case Study: How Does a Bike-Share

Navigate Speedy Success

Amit Kumar Anand
28 December, 2022

 1. Introduction
 2. Scenario
 3. Phase 1: Ask
o 3.0.1 Business objective
o 3.0.2 Business task
o 3.0.3 Stakeholders
 4. Phase 2: Prepare
o 4.0.1 Where is your data located?
o 4.0.2 How is the Data Organized?
o 4.0.3 Credibility of data
o 4.0.4 Licensing, privacy, security, and accessibility
o 4.0.5 Ability of Data to answer Business Questions
o 4.0.6 Challenges with the data
 5. Phase 3: Data Process
o 5.0.1 What tools are you choosing and why?
o 5.0.2 Review of Data
o 5.0.3 Setting up environment
o 5.0.4 Data Validation
 6. Phase 4: Data Cleaning
 7. Phase 5: Data analysis
 8. Phase 6: Data Visualizations and Summary
o 8.0.1 Visualization 1
o 8.0.2 Visualization 2
o 8.0.3 Visualization 3
o 8.0.4 Visualization 4
o 8.0.5 Visualization 5
o 8.0.6 Visualization 6
o 8.0.7 Visualization 7
 9. Phase 7: Act
o 9.0.1 Key Takeaways :
o 9.0.2 Recommendations :
o --------------------------------------------End of Case Study
————————————————
1. Introduction
The bike share company, Cyclistic, is based in Chicago and offers services to its users. In order
to improve its marketing strategy and drive future growth, the company is interested in
understanding the behavior of its users while using the services. As a part of the Google
Data Analytics certification course, I was given the opportunity to conduct a case study on
Cyclistic to analyze the data and provide insights on the users’ behavior. In this project, I will
follow the steps of the data analysis process: ask, prepare, process, analyze, and share. The
aim of this case study is to provide actionable insights for the company to consider in
their marketing strategy.

2. Scenario
Cyclistic operates a fleet of more than 5,800 bicycles which can be accessed from over 600
docking stations across the city. Bikes can be borrowed from one docking station, ridden,
then returned to any docking stations. Over the years marketing campaigns have been broad
and targeted a cross-section of potential users. Data analysis has shown that riders with an
annual membership are more profitable than casual riders. Lily Moreno, the director of
marketing, wants to implement a new marketing strategy in order to convert casual riders
into annual members. She believes that with the right campaign there is a very good chance of
such conversions between the user types. There are also user-friendly bike options include
such as electric bikes, classic bikes and docked bikes. It makes Cyclistic services more
inclusive to people. Lily has tasked the marketing analytics team to analyze past user data of
one year to find trends and habits of Cyclistic’s users to help create this marketing campaign.
The marketing analyst team would like to know:
 How annual members and casual riders differ
 Why casual riders would buy a membership
 How Cyclistic can use digital media to influence casual riders to become
members.
Here I have to analyze the Cyclistic historical bike trip data to identify trends in the usage of
bikes by casual and member riders.

3. Phase 1: Ask
3.0.1 Business objective
The business objective of the case study is to identify opportunities for targeted marketing
campaigns to convert casual riders into annual members. This will be done
through analysis of bike trip data and understanding user behavior and preferences. The
ultimate goal is to increase profitability and drive future growth for the company.
3.0.2 Business task
As an analyst my take is to do following:-

 Analyze Cyclistic’s historical bike trip data

 Identify trends and patterns in usage of bikes by casual and member riders
 Understand behavior and preferences of these two user groups
 Identify opportunities for targeted marketing campaigns to convert casual riders into
annual members
 Use insights from analysis to inform development of new marketing strategy for the
company
 Goal: increase profitability and drive future growth.

3.0.3 Stakeholders
The Stakeholders in this case study include:
 Lily Moreno: Director of Marketing at Cyclistic, who is responsible for
implementing the marketing campaigns at Cyclistic.
 Cyclistic’s marketing team: They will be responsible for conducting the analysis
and developing the marketing strategy based on the insights gained.
 Cyclistic’s casual riders: They are the target audience of the marketing campaign,
and their behavior and preferences will be a key focus of the analysis.
 Cyclistic’s annual members: They are the group that the marketing campaign
is trying to convert casual riders into, and their behavior and preferences may be
compared to those of casual riders in the analysis.
 Cyclistic’s investors and shareholders: They have a financial interest in the
company’s success and may be interested in the results of the analysis and any
changes to the marketing strategy.

4. Phase 2: Prepare
4.0.1 Where is your data located?
The data for this analysis was obtained from Motivate International Inc. and can
be accessed through the provided link. It includes 12 months of historical trip data from
Cyclistic, a fictional bike share company based in Chicago. It should be noted that the data is
public and can be used to explore how different customer types are using Cyclistic bikes.

4.0.2 How is the Data Organized?

For this project, the data used consists of monthly CSV files from the past 12 months
(December 2021 - November 2022). The files include 13 columns of information related to
ride details, such as ride id, ridership type, ride time, start and end locations, and
geographic coordinates…etc. The data is organized in a way that allows for analysis of
trends and patterns in the usage of Cyclistic’s bike share services.
4.0.3 Credibility of data
Motivate, Inc. collected the data for this analysis directly through its management of the
Cyclistic Bike Share program for the City of Chicago. The data is comprehensive and
consistent, as it includes information on all rides taken by users and is not just a sample. It is
also current, as it is released on a monthly basis by the City of Chicago. The data is made
available to the public by the City of Chicago.

4.0.4 Licensing, privacy, security, and accessibility

The data used for this analysis has had all identifying information removed in order
to protect the privacy of users. This limitation on the data does restrict the scope of the
possible analysis, as it is not possible to determine whether casual riders are repeat users or
residents of the Chicago area. The data is released under a specific license and is
made available for use in this analysis.

4.0.5 Ability of Data to answer Business Questions

The available dataset is sufficient for the purpose of answering the business
question regarding the differences in usage patterns between annual members and casual
riders. Through detailed observation of the variables in the data, it has been determined
that casual riders typically pay for individual or daily rides, while member riders tend to
purchase annual subscriptions. This information is important in understanding the
behavioral differences between the two groups and can be used to inform targeted
marketing campaigns. Additional analysis of other variables in the data, such as ride duration
and location, may provide further insights into the usage patterns of annual members and
casual riders.

4.0.6 Challenges with the data

The Challenges I faced during my data analysis are:

 Data preparation identified several issues, including duplicate records and missing
fields…etc, which were addressed through data cleaning
 Large amount of data (1.2 GB) required working with segments rather than
attempting to use diskframe functions
 Data cleaning, removal of unnecessary variables, and saving to CSV file on hard
drive allowed for efficient processing and analysis of data
 Specialized tools or techniques, were necessary for working with large amounts of
data. Tools like Excel failed to handle this amount of data R and Tableau were used.

5. Phase 3: Data Process

5.0.1 What tools are you choosing and why?
 In order to efficiently prepare, process, clean, analyze, and visualize the data for
this project, I selected RStudio Desktop as the primary tool. The large size of the
dataset made it impractical to use tools such as Microsoft Excel or Google Sheets,
and RStudio Cloud was also unable to handle the volume of data. RStudio Desktop
provided the necessary capabilities to effectively work with the data and
generate meaningful insights.
 In addition to RStudio Desktop, I also utilized Tableau to create visualizations for
this project. The powerful data visualization capabilities of Tableau allowed me
to effectively communicate the results of the analysis and highlight key trends
and patterns in the data.
 Overall, the combination of RStudio Desktop and Tableau proved to be a powerful
toolkit for preparing, processing, cleaning, analyzing, and visualizing the data for
this project.

5.0.2 Review of Data

In order to gain an understanding of the data and its potential for analysis, a review was
conducted to assess the content of the variables, the format of the data, and the integrity
of the data. This initial review provided an overview of the data and helped to identify
any potential issues or challenges that would need to be addressed in the preparation and
analysis process.
Data review involved the following:

 Checking column names across all the 12 original files.

 Checking for missing values.
 Checking of white spaces.
 Checking of duplicate records.
 Other data anomalies.

Results of the review found following things:

 Duplicate record of ID numbers.

 Records with missing start or end station name.
 Records with very short or very long ride duration.
 Records for trips starting or ending at an administrative station (repair or testing
station).

All 12 files were combined into one data set after initial review was completed.The final data
set consisted of 5733451 rows with 13 columns of character and numeric data. This matched
the number of records in all 12 monthly data files.

5.0.3 Setting up environment

#--------------------------------------------------------------------------
--------#
#load packages
library(tidyverse)
library(lubridate)
library(janitor)
library(data.table)
library(readr)
library(psych)
library(hrbrthemes)
library(ggplot2)

#--------------------------------------------------------------------------
--------#

5.0.4 Data Validation

 To enable more efficient and comprehensive analysis, it was necessary to upload the
individual data files into new vectors and combine them into a single, large
dataset. This process involved merging the data frames into a cohesive whole,
allowing for more effective analysis and interpretation of the data.

#--------------------------------------------------------------------------
--------#
#Import Data
december_2021 <- read.csv("data/202112-divvy-tripdata.csv")
january_2022 <- read.csv("data/202201-divvy-tripdata.csv")
february_2022 <- read.csv("data/202202-divvy-tripdata.csv")
march_2022 <- read.csv("data/202203-divvy-tripdata.csv")
april_2022 <- read.csv("data/202204-divvy-tripdata.csv")
may_2022 <- read.csv("data/202205-divvy-tripdata.csv")
june_2022 <- read.csv("data/202206-divvy-tripdata.csv")
july_2022 <- read.csv("data/202207-divvy-tripdata.csv")
august_2022 <- read.csv("data/202208-divvy-tripdata.csv")
september_2022 <- read.csv("data/202209-divvy-publictripdata.csv")
october_2022 <- read.csv("data/202210-divvy-tripdata.csv")
november_2022 <- read.csv("data/202211-divvy-tripdata.csv")
#--------------------------------------------------------------------------
--------#

 To ensure the accuracy and integrity of the combined dataset, it was necessary to
verify that the column names in the individual data files were compatible for
merging. This involved comparing the names and ensuring that they matched
perfectly, regardless of their order. This step was crucial to enable the use of a
command to join the data into a single file.

#--------------------------------------------------------------------------
--------#
#Data Validation
colnames(december_2021)
colnames(january_2022)
colnames(february_2022)
colnames(march_2022)
colnames(april_2022)
colnames(may_2022)
colnames(june_2022)
colnames(july_2022)
colnames(august_2022)
colnames(september_2022)
colnames(october_2022)
colnames(november_2022)
#--------------------------------------------------------------------------
--------#

 The total number of records in all 12 monthly data files was calculated to be 5733451
rows and 13 columns. This information provides an overview of the size and scope of
the data, which can be helpful in planning and executing the analysis process.

#--------------------------------------------------------------------------
--------#
# Total number of rows
sum(nrow(december_2021) + nrow(january_2022) + nrow(february_2022)
+ nrow(march_2022) + nrow(april_2022) + nrow(may_2022)
+ nrow(june_2022) + nrow(july_2022) + nrow(august_2022)
+ nrow(september_2022) + nrow(october_2022) + nrow(november_2022))
#--------------------------------------------------------------------------
--------#

 In the next step, the monthly data frames were aggregated into a single data frame.
This involved combining the data from each of the monthly files into a cohesive
whole, allowing for more efficient and comprehensive analysis of the
data. Aggregating the data in this way also made it easier to identify trends and
patterns across the entire dataset, rather than having to analyze the data for each
month separately.

#--------------------------------------------------------------------------
--------#
# Combine Data of 12 month into one for smooth workflow
trip_final <-
rbind(december_2021,january_2022,february_2022,march_2022,april_2022,

may_2022,june_2022,july_2022,august_2022,september_2022,october_2022,novemb
er_2022)
#--------------------------------------------------------------------------
--------#

 After aggregating the monthly data frames into a single data frame, the resulting
combined dataset was written to a new file and saved to the hard drive. This
allows for easier access to the data for further analysis and visualization, and ensures
that the data is available for future reference. Saving the data to a file on the hard
drive also ensures that it is backed up and protected against accidental loss or
damage.

#--------------------------------------------------------------------------
--------#
# Save the combined files
write.csv(trip_final,file = "data/trip_final.csv",row.names = FALSE)
#--------------------------------------------------------------------------
--------#
 After being saved to the hard drive, the data was once again subjected to
validation in order to ensure its accuracy, completeness, and consistency. This
process involved reviewing the data for errors or inconsistencies, checking for
missing or incomplete records, and verifying that the data met the requirements and
expectations of the analysis.

#--------------------------------------------------------------------------
--------#
#Final data validation
str(trip_final)
View(head(trip_final))
View(tail(trip_final))
dim(trip_final)
summary(trip_final)
names(trip_final)
#--------------------------------------------------------------------------
--------#

6. Phase 4: Data Cleaning

In this stage, I performed data cleaning to identify and correct or remove errors or
inconsistencies from the data. This will involve a variety of techniques, such as correcting
errors in data entry, removing duplicates or incorrect records, and standardizing
data formats to ensure compatibility with analysis tools. Data cleaning is an important step
in the data analysis process, as it helps to ensure that the data is accurate and reliable, and
that the results of the analysis are meaningful and useful.

 Before beginning the data cleaning process, it is necessary to check the total number
of rows with missing or “NA” values. Understanding the extent of missing or
incomplete data helps to inform decisions about how to handle these values, such as
whether to drop them from the dataset or impute them with estimates or
substitute values.

#--------------------------------------------------------------------------
--------#
#Count rows with "na" values
colSums(is.na(trip_final))
#--------------------------------------------------------------------------
--------#

 To ensure the integrity and reliability of the data, it is necessary to remove a certain
percentage of missing or “NA” values. In this case, the missing values will be
removed and saved into a new data frame.

#--------------------------------------------------------------------------
--------#
#Remove missing
clean_trip_final <- trip_final[complete.cases(trip_final), ]
#--------------------------------------------------------------------------
--------#

 Removing duplicates helps to ensure that the data is as complete and accurate as
possible, and that the results of the analysis are not unduly influenced by duplicate
or erroneous data.

#--------------------------------------------------------------------------
--------#
#Remove duplicates
clean_trip_final <- distinct(clean_trip_final)
#--------------------------------------------------------------------------
--------#

 To further refine and clean the data, it is necessary to remove empty, “NA”, and
missing values. This can be achieved through the use of functions such
as drop_na(), remove_empty(), and remove_missing()

#--------------------------------------------------------------------------
--------#
#Remove na
clean_trip_final <- drop_na(clean_trip_final)
clean_trip_final <- remove_empty(clean_trip_final)
clean_trip_final <- remove_missing(clean_trip_final)
#--------------------------------------------------------------------------
--------#

 Now, it is necessary to filter out records where the value of the “started_at” variable
is greater than the value of the “ended_at” variable. This can help to ensure that the
data is accurate and meaningful

#--------------------------------------------------------------------------
--------#
#Remove data with greater start_at than end_at
clean_trip_final<- clean_trip_final %>%
filter(started_at < ended_at)
#--------------------------------------------------------------------------
--------#

 To improve the clarity and understanding of the data, it is necessary to change

a few column names. This involve’s renaming columns to more accurately reflect
their content, or to use more descriptive or intuitive names.

#--------------------------------------------------------------------------
--------#
#Renaming column for better context
clean_trip_final <- rename(clean_trip_final, costumer_type = member_casual,
bike_type = rideable_type)
#--------------------------------------------------------------------------
--------#

 To facilitate more granular analysis of the data, additional columns were created for
the date, month, day, year, and day of the week based on the “started_at” column.
This allowed for more detailed analysis of the data by specific dates, days, or
months, and helped to identify trends and patterns that may not have been apparent
when analyzing the data at a more general level.

#--------------------------------------------------------------------------
--------#
#Separate date in date, day, month, year for better analysis
clean_trip_final$date <- as.Date(clean_trip_final$started_at)
clean_trip_final$week_day <- format(as.Date(clean_trip_final$date), "%A")
clean_trip_final$month <- format(as.Date(clean_trip_final$date), "%b_%y")
clean_trip_final$year <- format(clean_trip_final$date, "%Y")
#--------------------------------------------------------------------------
--------#

 Similarly a new column was created just for the time in “%H:%M” format.

#--------------------------------------------------------------------------
--------#
#Separate column for time
clean_trip_final$time <- as.POSIXct(clean_trip_final$started_at, format =
"%Y-%m-%d %H:%M:%S")
clean_trip_final$time <- format(clean_trip_final$time, format = "%H:%M")
#--------------------------------------------------------------------------
--------#

 To gain a better understanding of the duration of rides, a column was created to

calculate the duration of rides based on the start and end time of each ride. This
allows for more detailed analysis of ride duration’s, and can help to identify trends
and patterns in the data.

#--------------------------------------------------------------------------
--------#
#Add ride length column
clean_trip_final$ride_length <- difftime(clean_trip_final$ended_at,
clean_trip_final$started_at, units = "mins")
#--------------------------------------------------------------------------
--------#

 To focus the analysis on the variables of interest, data that will not be used for this
analysis was filtered out. This was done using the “select()” function to select only
the relevant variables.
#--------------------------------------------------------------------------
--------#
#Select the data we are going to use
clean_trip_final <- clean_trip_final %>%
select(bike_type, costumer_type, month, year, time, started_at, week_day,
ride_length)
#--------------------------------------------------------------------------
--------#

 To ensure the accuracy and reliability of the data, it is necessary to get rid of
excessively long rides, as these may be considered stolen by Cyclistic. Rides are
typically limited to a duration of one day or 1440 minutes, or 24 hours also data
below 5 minutes was removed due to it begin too small for affecting this analysis.

#--------------------------------------------------------------------------
--------#
#Remove stolen bikes
clean_trip_final <- clean_trip_final[!clean_trip_final$ride_length>1440,]
clean_trip_final <- clean_trip_final[!clean_trip_final$ride_length<5,]
#--------------------------------------------------------------------------
--------#

 Before moving on to the next phase of the data analysis process, it is important to
perform one final check to ensure that all necessary data cleaning and preparation
steps have been completed.

#--------------------------------------------------------------------------
--------#
#Check Cleaned data
colSums(is.na(clean_trip_final))
View(filter(clean_trip_final, clean_trip_final$started_at >
clean_trip_final$ended_at))
View(filter(clean_trip_final, clean_trip_final$ride_length>1440 |
clean_trip_final < 5))
#--------------------------------------------------------------------------
--------#

 Once all necessary data cleaning and preparation steps have been completed, the
data can be saved to the hard disk as a csv file.

#--------------------------------------------------------------------------
--------#
#Save the cleaned data
write.csv(clean_trip_final,file = "clean_trip_final.csv",row.names = FALSE)
#--------------------------------------------------------------------------
--------#
7. Phase 5: Data analysis
During the Data analysis phase, I explored the data in order to gain a better
understanding of its characteristics and patterns. I created charts, graphs, and other
types of visualizations to help visualize the data and identify trends. I also used statistical
techniques, such as regression analysis, to identify relationships between different variables
in the data. By analyzing the data in this way, I was able to extract insights and
knowledge that could inform business decisions and support decision making.

 To begin the analysis phase, I imported the cleaned and prepared trip data into my
analysis software. I conducted a thorough validation of the data to ensure that it
was accurate and free of errors.

#--------------------------------------------------------------------------
--------#
#import the cleaned data
clean_trip_final <- read_csv("clean_trip_final.csv")

str(clean_trip_final)
names(clean_trip_final)
#--------------------------------------------------------------------------
--------#

 To better facilitate my analysis, I sorted the month and week day variables in the trip
data in ascending order. This allowed me to easily compare and analyze trends
across different time periods and days of the week.

#--------------------------------------------------------------------------
--------#
#order the data
clean_trip_final$month <-
ordered(clean_trip_final$month,levels=c("Dec_21","Jan_22","Feb_22","Mar_22"
,

"Apr_22","May_22","Jun_22","Jul_22",

"Aug_22","Sep_22","Oct_22","Nov_22"))

clean_trip_final$week_day <- ordered(clean_trip_final$week_day, levels =

c("Sunday", "Monday", "Tuesday",

"Wednesday", "Thursday",

"Friday", "Saturday"))
#--------------------------------------------------------------------------
--------#

 As a first step in my analysis, I calculated key summary statistics for ride length,
including the minimum, maximum, median, and average values. These values
provided a broad overview of the distribution of ride lengths among Cyclistic’s
customers and allowed me to identify any extreme values or unusual patterns in the
data.

#--------------------------------------------------------------------------
--------#
#Analysis:- min, max, median, average
View(describe(clean_trip_final$ride_length, fast=TRUE))
#--------------------------------------------------------------------------
--------#

 As a next step in my analysis, I examined the distribution of Cyclistic’s customers by

membership type. This included breaking down the data by annual
members and casual riders.

#--------------------------------------------------------------------------
--------#
#Total no. of customers
View(table(clean_trip_final$costumer_type))
#--------------------------------------------------------------------------
--------#

 Continuing my analysis, I calculated the total number of rides taken by each

customer type, as well as the total duration of these rides in minutes. This analysis
allowed me to understand the overall usage patterns of Cyclistic’s bike share service
among different customer types

#--------------------------------------------------------------------------
--------#
#Total rides for each customer type in minutes
View(setNames(aggregate(ride_length ~ costumer_type, clean_trip_final,
sum), c("customer_type", "total_ride_len(mins)")))
#--------------------------------------------------------------------------
--------#

 In my next analysis, I focused specifically on comparing the ride length patterns of

annual members and casual riders. To do this, I calculated key summary statistics,
including the mean, median, maximum, and minimum values, for ride length
among these two customer types.

#--------------------------------------------------------------------------
--------#
#Differences between members and casual riders in terms of length of ride
View(clean_trip_final %>%
group_by(costumer_type) %>%
summarise(min_length_mins = min(ride_length), max_length_min =
max(ride_length),
median_length_mins = median(ride_length), mean_length_min
= mean(ride_length)))
#--------------------------------------------------------------------------
--------#

 In my subsequent analysis, I focused on analyzing the average ride length of

Cyclistic’s users by day of the week, as well as the total number of rides taken on
each day of the week.

#--------------------------------------------------------------------------
--------#
#Average ride_length for users by day_of_week and Number of total rides by
day_of_week
View(clean_trip_final %>%
group_by(week_day) %>%
summarise(Avg_length = mean(ride_length),
number_of_ride = n()))
#--------------------------------------------------------------------------
--------#

 After this, I analyzed the number of average rides taken by Cyclistic’s users by
month. This analysis allowed me to understand the seasonal fluctuations in usage of
the bike share service, and to identify any trends or patterns in usage levels over
the course of a year.

#--------------------------------------------------------------------------
--------#
#Average ride_length by month
View(clean_trip_final %>%
group_by(month) %>%
summarise(Avg_length = mean(ride_length),
number_of_ride = n()))
#--------------------------------------------------------------------------
--------#

 Now, I compared the average ride length of Cyclistic’s users by week day according
to customer type. This analysis allowed me to understand how the behavior and
usage patterns of annual members and casual riders differed from one another
on different days of the week.

#--------------------------------------------------------------------------
--------#
#Average ride length comparison by each week day according to each customer
type
View(aggregate(clean_trip_final$ride_length ~
clean_trip_final$costumer_type +
clean_trip_final$week_day, FUN = mean))
#--------------------------------------------------------------------------
--------#

 In my next analysis, I compared the average ride length of Cyclistic’s users by

month according to customer type.
#--------------------------------------------------------------------------
--------#
#Average ride length comparison by each month according to each customer
type
View(aggregate(clean_trip_final$ride_length ~
clean_trip_final$costumer_type +
clean_trip_final$month, FUN = mean))
#--------------------------------------------------------------------------
--------#

 Here, I analyzed the ride length data of Cyclistic’s users by customer type and
weekday. This allowed me to understand the behavior and usage patterns of annual
members and casual riders on different days of the week.

#--------------------------------------------------------------------------
--------#
#Analyze rider length data by customer type and weekday
View(clean_trip_final %>%
group_by(costumer_type, week_day) %>%
summarise(number_of_ride = n(),
avgerage_duration = mean(ride_length),
median_duration = median(ride_length),
max_duration = max(ride_length),
min_duration = min(ride_length)))
#--------------------------------------------------------------------------
--------#

 Here, I analyzed the ride length data of Cyclistic’s users by customer type and
month.

#--------------------------------------------------------------------------
--------#
#Analyze rider length data by customer type and month
View(clean_trip_final %>%
group_by(costumer_type, month) %>%
summarise(nummber_of_ride = n(),
average_duration = mean(ride_length),
median_duration = median(ride_length),
max_duration = max(ride_length),
min_duration = min(ride_length)))
#--------------------------------------------------------------------------
--------#

 The data was then written to a new file next phase of data visualization.

#--------------------------------------------------------------------------
--------#
#Save the data for data visualization
write.csv(clean_trip_final,file = "clean_trip_final_tableau.csv",row.names
= FALSE)
#--------------------------------------------------------------------------
--------#

8. Phase 6: Data Visualizations and Summary

8.0.1 Visualization 1

 This visualization shows the total number of rides per day of the week for each
customer type. It appears that casual riders have the highest number of rides on
Saturdays and Sundays, potentially indicating leisurely use of the bikes on the
weekends. Meanwhile, members have a more consistent number of rides
throughout the week, with slightly higher numbers on Tuesdays and Wednesdays.
This suggests that members may primarily use the bikes for their regular commuting
needs.
8.0.2 Visualization 2

 This plot demonstrates that annual members tend to use the bikes more frequently
during rush hour, potentially for commuting to and from work. On the other
hand, casual riders show a more steady increase in usage throughout the day,
with a peak at around 6pm and a steady decrease thereafter. This suggests that
casual riders may be using the bikes more for leisure activities. These insights
provide valuable information on the different usage patterns of annual members
and casual riders, which can inform strategies for promoting the bike share program
and targeting different customer segments.
8.0.3 Visualization 3

 This plot shows the monthly usage trends of bike sharing among annual members
and casual riders. It appears that there is a higher demand for bike usage during
the summer months for both customer types, with casual riders showing a slightly
higher demand. On the other hand, the demand for bike usage among casual riders
decreases significantly in the winter months, while annual members continue to
use the service at a relatively consistent rate throughout the year. This further
supports our analysis that annual members may rely on the bike sharing service
for their regular commute, while casual riders may use it more for leisure and
recreational purposes.
8.0.4 Visualization 4

 In this plot between the Avg. Ride duration and Week days, It is clear that casual
riders tend to use the bike share service primarily on weekends for leisure or
recreational purposes, while annual members use the service more consistently
throughout the week, possibly for commuting to work or other daily activities. This
information could be useful for Cyclistic in terms of understanding how to target
marketing efforts and potentially adjusting pricing or membership plans to better
meet the needs of these different customer groups.
8.0.5 Visualization 5

 To summarize, the analysis of the plot showed that annual members and casual riders
use the bike-sharing service differently. Annual members tend to use the bikes for
their regular commutes, with a steadier usage throughout the week and
year. Casual riders, on the other hand, tend to use the bikes more for leisure, with
higher usage on weekends and in the summer months. Additionally, the average
ride length for casual riders was found to be longer than that of annual
members. These findings can inform business decisions and support decision making
for the bike-sharing company.
8.0.6 Visualization 6

 The analysis of bike type usage showed that members prefer classic bikes over
electric and docked bikes, while casual riders have similar usage of electric bikes
and a slightly higher preference for docked bikes. It was unclear from the data sets
what exactly is meant by “docked bikes,” but it is evident that this type of bike is not a
popular choice for annual members as no member used it over the year. Overall,
both groups showed a preference for classic bikes over the other options.
8.0.7 Visualization 7

 Additionally, this graph indicates that classic bikes are the most popular choice
among both members and casual riders, followed by electric bikes. Docked bikes
are the least popular choice. It is worth noting that the popularity of classic bikes
among annual members is much higher compared to casual riders. This could
suggest that annual members have a preference for classic bikes over the other
options, possibly due to their reliability and simplicity. On the other hand, casual
riders seem to have a more balanced distribution of bike choices, with electric bikes
being a close second in popularity. Overall, this graph provides insights into the
preferences and habits of bike-sharing service users, which could be useful for the
company in terms of marketing and resource allocation.
9. Phase 7: Act
9.0.1 Key Takeaways :

 Annual members primarily use the bike-sharing service for commuting purposes, while
casual riders tend to use it for leisure, particularly on weekends and in the summer
months.
 Annual members exhibit a more consistent usage of the service throughout the week
and year, compared to casual riders.
 Both annual members and casual riders favor classic bikes over the other two types of
bikes offered. However, annual members primarily use classic bikes and rarely use
docked bikes, while casual riders are more likely to use all types of bikes.
 Casual riders tend to have longer ride duration’s, averaging around 50% longer than
annual members.
 Casual riders show lower usage of the service during the winter months compared to
annual members.

9.0.2 Recommendations :

 Increase marketing efforts targeting leisure riders, especially during the summer
months and on weekends, in order to increase bike usage and revenue.
 Consider offering discounts or incentives for annual members to encourage them to
use the bikes more regularly throughout the week and year.
 Evaluate the reasons for the low popularity of docked bikes among both annual
members and casual riders, and consider reevaluating the offering or improving the
service to increase usage.
 Focus on improving the classic bike fleet, as it is the most popular among both annual
members and casual riders.
 Consider offering longer rental periods or multi-day rentals for casual riders, as their
average ride length is longer than annual members, in order to increase revenue.
 Increase marketing efforts targeting casual riders during the winter months in order to
increase usage and revenue during traditionally slower periods.

---------

Tableau Notes
No ratings yet
Tableau Notes
77 pages
Top 100 Excel Tips by Nicolas Boucher
57% (7)
Top 100 Excel Tips by Nicolas Boucher
1 page
4 Data Wrangling With Excel
No ratings yet
4 Data Wrangling With Excel
27 pages
Data Science Roles, Stages in A Data Science Project
No ratings yet
Data Science Roles, Stages in A Data Science Project
14 pages
Tutorial Rapid Miner Life Insurance Promotion 1 PDF
No ratings yet
Tutorial Rapid Miner Life Insurance Promotion 1 PDF
11 pages
Computer Lab Report (MS-EXCEL) (2081!07!13)
No ratings yet
Computer Lab Report (MS-EXCEL) (2081!07!13)
16 pages
Business Analytics (A Case-Study Approach Using LDA Topic Modeling)
No ratings yet
Business Analytics (A Case-Study Approach Using LDA Topic Modeling)
6 pages
DataMiningForTheMasses (001 158)
No ratings yet
DataMiningForTheMasses (001 158)
158 pages
Analytics Case Studies Ebook
No ratings yet
Analytics Case Studies Ebook
12 pages
UNIT - 5 (E-Banking)
No ratings yet
UNIT - 5 (E-Banking)
18 pages
RapidMiner Tutorial Breve PDF
No ratings yet
RapidMiner Tutorial Breve PDF
24 pages
Rapid Miner - Data Preparation
100% (1)
Rapid Miner - Data Preparation
17 pages
How Big Companies Use Big Data
No ratings yet
How Big Companies Use Big Data
4 pages
Power BI Case Study Meta Data Sheet-2
No ratings yet
Power BI Case Study Meta Data Sheet-2
1 page
Rapidminer Studio Operator Reference 9
No ratings yet
Rapidminer Studio Operator Reference 9
1,204 pages
A Review On Large Language Models Architectures Ap
No ratings yet
A Review On Large Language Models Architectures Ap
31 pages
Data Analytics With Excel Lab2 Manual
No ratings yet
Data Analytics With Excel Lab2 Manual
98 pages
Rapidminer
No ratings yet
Rapidminer
8 pages
Rapidminer Report
No ratings yet
Rapidminer Report
28 pages
A Review On Large Language Models Architectures Applications Taxonomies Open Issues and Challenges
No ratings yet
A Review On Large Language Models Architectures Applications Taxonomies Open Issues and Challenges
36 pages
Visualisation For Data Science Predict Overview 3267
No ratings yet
Visualisation For Data Science Predict Overview 3267
15 pages
Maintaining and Monitoring The Online Presence
No ratings yet
Maintaining and Monitoring The Online Presence
6 pages
Task 1 - Unit 5 - V2
No ratings yet
Task 1 - Unit 5 - V2
9 pages
Final - Healthcare Analytics MDP
No ratings yet
Final - Healthcare Analytics MDP
6 pages
DSTP2.0-Batch-05 DBI101 3
No ratings yet
DSTP2.0-Batch-05 DBI101 3
3 pages
Data Analytics Applications - Case Studies
No ratings yet
Data Analytics Applications - Case Studies
20 pages
Brochure Advanced Excel Training
No ratings yet
Brochure Advanced Excel Training
5 pages
Chapter 1 Data Analysis
No ratings yet
Chapter 1 Data Analysis
18 pages
Sea Water Corrosion Stainless Steels Mechanisms and Experiences
No ratings yet
Sea Water Corrosion Stainless Steels Mechanisms and Experiences
199 pages
What Is A DSS?: Decision Support Systems Concepts, Methodologies, and Technologies: An Overview
No ratings yet
What Is A DSS?: Decision Support Systems Concepts, Methodologies, and Technologies: An Overview
9 pages
K Means R and Rapid Miner Patient and Mall Case Study
No ratings yet
K Means R and Rapid Miner Patient and Mall Case Study
80 pages
Data Science
100% (1)
Data Science
7 pages
Big Data: Introduction To Terms, Concepts and Tools
No ratings yet
Big Data: Introduction To Terms, Concepts and Tools
23 pages
Rapid Minder Assignment
No ratings yet
Rapid Minder Assignment
38 pages
Text Mining in R (Intro)
0% (1)
Text Mining in R (Intro)
4 pages
Visualizations in Spreadsheets and Tableau
No ratings yet
Visualizations in Spreadsheets and Tableau
4 pages
A Comprehensive Guide On Advanced Microsoft Excel For Data Analysis
No ratings yet
A Comprehensive Guide On Advanced Microsoft Excel For Data Analysis
15 pages
Text Mining With R - Twitter Data Analysis
No ratings yet
Text Mining With R - Twitter Data Analysis
24 pages
Assignment 1&2
No ratings yet
Assignment 1&2
4 pages
Big Data Analytics A Spotify Case Study
No ratings yet
Big Data Analytics A Spotify Case Study
9 pages
Adjacency Matrix
No ratings yet
Adjacency Matrix
6 pages
Resume - Rajat Chaturvedi
No ratings yet
Resume - Rajat Chaturvedi
3 pages
Data Analysis
No ratings yet
Data Analysis
17 pages
Excel and Advance Excel: Sr. No 1
No ratings yet
Excel and Advance Excel: Sr. No 1
1 page
Tutorial Rapid Miner Life Insurance Promotion PDF
No ratings yet
Tutorial Rapid Miner Life Insurance Promotion PDF
11 pages
Text and Sentiment Analysis
No ratings yet
Text and Sentiment Analysis
41 pages
Fourth Edition: Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization
No ratings yet
Fourth Edition: Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization
66 pages
Mini Project Time Series
No ratings yet
Mini Project Time Series
55 pages
Business Intelligence & Business Analytics
No ratings yet
Business Intelligence & Business Analytics
8 pages
AutoML in Power BI
No ratings yet
AutoML in Power BI
24 pages
Rapid Miner
No ratings yet
Rapid Miner
24 pages
Hands-On Lab 5 - Cleaning Data
No ratings yet
Hands-On Lab 5 - Cleaning Data
5 pages
Data Scientist - KD PDF
No ratings yet
Data Scientist - KD PDF
1 page
EN - BioMajesty 6010 - C
100% (1)
EN - BioMajesty 6010 - C
2 pages
Seminar 7 Introduction To Databases
No ratings yet
Seminar 7 Introduction To Databases
41 pages
Analytics - PrepBook 2018 Laterals
No ratings yet
Analytics - PrepBook 2018 Laterals
34 pages
Lab Manual
No ratings yet
Lab Manual
46 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
Excel Adv Formulae & Functions
No ratings yet
Excel Adv Formulae & Functions
26 pages
PSSC Maths Statistics Project Handbook Eff08 PDF
No ratings yet
PSSC Maths Statistics Project Handbook Eff08 PDF
19 pages
Flange Bolting Torque Values API 6A
100% (1)
Flange Bolting Torque Values API 6A
1 page
Obsolescence Management in The Supply Chain With The IEC 62402-2019 Standard - Sergio Garcia
No ratings yet
Obsolescence Management in The Supply Chain With The IEC 62402-2019 Standard - Sergio Garcia
17 pages
Final Report Checking Modern Headlight Systems
No ratings yet
Final Report Checking Modern Headlight Systems
60 pages
What Is A Building Management System?
100% (1)
What Is A Building Management System?
14 pages
1851441271253300110871
No ratings yet
1851441271253300110871
1 page
Business Analytics
No ratings yet
Business Analytics
9 pages
Objective Test
No ratings yet
Objective Test
36 pages
(Biblical Interpretation Series) King, J. - Speech-In-Character, Diatribe, and Romans 3 - 1-9 - Who's Speaking When and Why It Matters-Brill (2018)
No ratings yet
(Biblical Interpretation Series) King, J. - Speech-In-Character, Diatribe, and Romans 3 - 1-9 - Who's Speaking When and Why It Matters-Brill (2018)
347 pages
Acquisition Logsitics Guide DAU Dec 97
No ratings yet
Acquisition Logsitics Guide DAU Dec 97
355 pages
K008 - OHSE Concrete Works Checklist
No ratings yet
K008 - OHSE Concrete Works Checklist
1 page
Historic Vessel Preservation
No ratings yet
Historic Vessel Preservation
105 pages
Aircraft Conceptual Design Practices and
No ratings yet
Aircraft Conceptual Design Practices and
156 pages
BSC 4
No ratings yet
BSC 4
2 pages
The Basics of Black Oxide
No ratings yet
The Basics of Black Oxide
4 pages
Reproductive System
No ratings yet
Reproductive System
8 pages
Master Thesis - Risk Management in New Technology Deployment Projects
No ratings yet
Master Thesis - Risk Management in New Technology Deployment Projects
81 pages
Assigment Histology
No ratings yet
Assigment Histology
40 pages
BAE AVCOM DMSMS Capability Brief
No ratings yet
BAE AVCOM DMSMS Capability Brief
19 pages
Lba Report Odisha Team
No ratings yet
Lba Report Odisha Team
10 pages
History of Electronic Warfare PDF
No ratings yet
History of Electronic Warfare PDF
84 pages
Adventures in Aircraft Design With John
No ratings yet
Adventures in Aircraft Design With John
16 pages
NSRP Testing Procedure
No ratings yet
NSRP Testing Procedure
502 pages
DRG QS5 Pregens
No ratings yet
DRG QS5 Pregens
4 pages
Framework Development For Robust Design
No ratings yet
Framework Development For Robust Design
12 pages
Farzana Akter - Energy Conversions
0% (2)
Farzana Akter - Energy Conversions
4 pages
Seeker 400 Unmanned Aerial Vehicle Surveillance System - Airforce Technology
No ratings yet
Seeker 400 Unmanned Aerial Vehicle Surveillance System - Airforce Technology
8 pages
2250reozm 0720
No ratings yet
2250reozm 0720
3 pages
Islamic Law and The Portuguese Coloniali
No ratings yet
Islamic Law and The Portuguese Coloniali
11 pages
Air Commodore Saeed Malik Retd Obsolescence in MRO
No ratings yet
Air Commodore Saeed Malik Retd Obsolescence in MRO
29 pages
Criticism B.A 5th Sem
No ratings yet
Criticism B.A 5th Sem
19 pages
Pmei l4 Complete
No ratings yet
Pmei l4 Complete
4 pages
Timestamp Enrollment No Name Father Name Gender
No ratings yet
Timestamp Enrollment No Name Father Name Gender
12 pages
Final Examination Building and Enhancing New Literacies Across The Curriculum
100% (1)
Final Examination Building and Enhancing New Literacies Across The Curriculum
3 pages
Paving Flooring and Dado
No ratings yet
Paving Flooring and Dado
17 pages
Background of The Study
No ratings yet
Background of The Study
5 pages
Cat Questions
No ratings yet
Cat Questions
5 pages
NEETTest Paper - Physics - 28-4-2020 Deepak PDF
No ratings yet
NEETTest Paper - Physics - 28-4-2020 Deepak PDF
5 pages
Stronghold 3 - Keyboard Shortcuts
No ratings yet
Stronghold 3 - Keyboard Shortcuts
2 pages
Fuel Cell System: Performance and Efficiency
No ratings yet
Fuel Cell System: Performance and Efficiency
2 pages
Introducing Microfem
No ratings yet
Introducing Microfem
57 pages
A Mathematica Package For Visualizing Objects Inmersed in R4
No ratings yet
A Mathematica Package For Visualizing Objects Inmersed in R4
15 pages
CAMPUS AMBASSADORS PROGRAM - Call For Applications - Docx 2020-2021
No ratings yet
CAMPUS AMBASSADORS PROGRAM - Call For Applications - Docx 2020-2021
4 pages
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
From Everand
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
Carl A. Bolton
No ratings yet
ISO 80000-3 A Complete Guide
From Everand
ISO 80000-3 A Complete Guide
Gerardus Blokdyk
No ratings yet

Case Study Data Analytics Bicycle

Uploaded by

Case Study Data Analytics Bicycle

Uploaded by

Case Study: How Does a Bike-Share

Navigate Speedy Success

 Analyze Cyclistic’s historical bike trip data

4.0.2 How is the Data Organized?

4.0.4 Licensing, privacy, security, and accessibility

4.0.5 Ability of Data to answer Business Questions

4.0.6 Challenges with the data

5. Phase 3: Data Process

5.0.2 Review of Data

 Checking column names across all the 12 original files.

Results of the review found following things:

 Duplicate record of ID numbers.

5.0.3 Setting up environment

5.0.4 Data Validation

6. Phase 4: Data Cleaning

 To improve the clarity and understanding of the data, it is necessary to change

 To gain a better understanding of the duration of rides, a column was created to

clean_trip_final$week_day <- ordered(clean_trip_final$week_day, levels =

 As a next step in my analysis, I examined the distribution of Cyclistic’s customers by

 Continuing my analysis, I calculated the total number of rides taken by each

 In my next analysis, I focused specifically on comparing the ride length patterns of

 In my subsequent analysis, I focused on analyzing the average ride length of

 In my next analysis, I compared the average ride length of Cyclistic’s users by

8. Phase 6: Data Visualizations and Summary

You might also like