Have you ever thought about how ride-hailing companies manage large amounts of booking data, how to analyze customer behaviour and decide on discounts to offer? In this blog, we will do an in-depth analysis of Bengaluru ride data using a large dataset of 50,000 Ola bookings.
It covers essential aspects like booking statuses, cancellations, ride distances, payment methods, and ratings. With SQL for data preparation and Power BI for visualization, we uncover trends like high weekend demand, popular vehicle types, and reasons for cancellations.
These insights explain how data-driven decisions help companies enhance customer satisfaction by optimizing operations and improving service quality.
The Dataset
The Ola dataset contains 50,000 rows of ride-booking data for Bengaluru over one month. This dataset contains a variety of data related to ride bookings, customer and driver interactions, cancellations and more. The overview of columns are:
- Date: The date of booking.
- Time: The time at which the booking was made.
- Booking ID: A unique 10-digit identifier, prefixed by "CNR" (e.g., CNR1234567890).
- Booking Status: The status of the booking (Successful/Cancelled).
- Customer ID: Unique identifier for each customer.
- Vehicle Type: Different types of vehicles used for the ride (e.g., Auto, Prime Plus, etc.).
- Pickup Location: Dummy locations from 50 different areas in Bengaluru.
- Drop Location: Another set of locations, chosen from the list of pickup locations.
- Avg VTAT (Vehicle Time Arrival Time): The average time taken for the vehicle to arrive at the pickup location.
- Avg CTAT (Customer Time Arrival Time): The average time taken for the customer to arrive at the pickup location.
- Cancelled Rides by Customer: If the ride was cancelled by the customer.
- Reason for Cancelling by Customer: Various reasons for cancellation (e.g., AC not working, change of plans).
- Cancelled Rides by Driver: Cancellations by the driver, often due to personal or vehicle-related issues.
- Incomplete Rides: Cases where the ride was incomplete for some reason.
- Incomplete Rides Reason: Reasons for incomplete rides, such as vehicle breakdown or customer demand.
- Booking Value: The fare for the ride.
- Payment Method: The method used to pay (Cash/UPI/Card/Wallet).
- Ride Distance: Distance of the ride in kilometres.
- Driver Ratings: Ratings given to the driver by the customer.
- Customer Rating: Ratings given by the driver to the customer.
You can download the dataset from the GitHub Repository.
Data Constraints and Project Goals
As part of the project, we will do some was tasked with ensuring specific constraints in the data to real-world conditions:
- Booking Status: 62% of the rides should have a "Success" status with the remaining 38% being cancellations.
- Cancelled Rides by Customer: No more than 7% of the bookings should be cancelled by customers.
- Cancelled Rides by Driver: No more than 18% of the bookings should be cancelled by drivers.
- Incomplete Rides: The percentage of incomplete rides should be under 6%.
- Weekend and Match Day Orders: The number of orders should be higher on weekends and match days. I marked specific dates as match days.
- Order Values: Around 70% of orders should have a fare under ₹500, while 28% should be above ₹500, with the remainder above ₹100.
- Food Category: The dataset reflects 67% of orders being from the food category, a common service in Bengaluru.
Data Preparation with SQL
The first step in the project is to create the dataset using SQL which helped us to handle large volumes of data efficiently. We will create one database which called Oladb on the MySQL Workbench.
We will import the Bengaluru_Ola_Booking_Data.csv file to extract the dataset from the CSV files to SQL Workbench to perform EDA (Exploratory Data Analysis).
If you don’t know how to import CSV files to MySQL Workbench then follow this. Below is the overview of ola_booking_table.
Exploration Data Analysis (EDA)
Let's perform an Exploration Data Analysis (EDA) to find insight that can help to take decisions:
Query 1. Retrieve all successful bookings:
SELECT * FROM ola_booking_table WHERE Booking_Status = 'Success';
Explanation: This query retrieves all the booking details where the status is marked as "Success," providing insights into completed rides.
Query 2. Find the average ride distance for each vehicle type:
SELECT 'Vehicle Type', AVG(‘Ride Distance’) as avg_distance FROM ola_booking_table GROUP BY 'Vehicle Type';
Explanation: This query calculates the average ride distance for each type of vehicle, helping analyze performance and customer preferences by vehicle category.
Query 3. Get the total number of cancelled rides by customers:
SELECT COUNT(*) FROM ola_booking_table WHERE 'Booking Status' = 'cancelled by Customer';
Explanation: This query counts the total number of rides cancelled by customers, useful for understanding customer behaviour and cancellation trends.
Query 4. List the top 5 customers who booked the highest number of rides:
SELECT 'Customer ID', COUNT('Booking ID') as total_rides FROM ola_booking_table
GROUP BY 'Customer ID'
ORDER BY total_rides DESC
LIMIT 5;
Explanation: This query identifies the top 5 customers based on the number of rides booked, highlighting the most active users.
Query 5. Get the number of rides cancelled by drivers due to personal and car-related issues:
SELECT COUNT(*) FROM ola_booking_table
WHERE 'cancelled Rides by Driver' ='Personal & Car related issue';
Explanation: This query calculates the number of rides cancelled by drivers due to personal or car-related reasons, helping identify operational issues.
Query 6. Find the maximum and minimum driver ratings for Prime Sedan bookings:
SELECT MAX('Driver Ratings') as max_rating, MIN('Driver Ratings') as min_rating FROM ola_booking_table
WHERE 'Vehicle Type' = 'Prime Sedan';
Explanation: This query finds the highest and lowest driver ratings for Prime Sedan bookings, helping assess service quality for this vehicle type.
Query 7. Retrieve all rides where payment was made using UPI:
SELECT * FROM ola_booking_table WHERE 'Payment Method' = 'UPI';
Explanation: This query retrieves all ride details where the payment method was made using UPI by providing insights into the popularity of digital payment modes.
Query 8. Find the average customer rating per vehicle type:
SELECT 'Vehicle Type', AVG('Customer Rating') as avg_customer_rating FROM ola_booking_table
GROUP BY 'Vehicle Type';
Explanation: This query calculates the average customer rating for each vehicle type, helping evaluate customer satisfaction across different categories.
Query 9. Calculate the total booking value of rides completed successfully:
SELECT SUM('Booking Value') as total_successful_value FROM ola_booking_table WHERE
'Booking Status' = 'Success';
Explanation: This query computes the total revenue generated from successfully completed rides, providing key financial metrics.
Query 10. List all incomplete rides along with the reason:
SELECT 'Booking ID', 'Incomplete Rides Reason' FROM ola_booking_table WHERE 'Incomplete Rides'= 'Yes';
Explanation: This query lists all incomplete rides along with the reasons, helping identify and address the root causes of incomplete rides.
Data Analysis and Visualization with Power BI
After the Exploratory Data Analysis now we will export our data into MS Power BI for detailed analysis and visualization. Power BI helped us to transform raw data into interactive dashboards and reports by providing deep insights into ride-booking patterns.
Based on the dataset we will find the insights as shown below:
1. Ride Volume Over Time and Booking Status Breakdown
A time-series chart showing the number of rides per day/week.
A pie or doughnut chart displaying the proportion of different booking statuses (success, cancelled by the customer, cancelled by the driver, etc.)
overall analysis2. Top Vehicle Types by Ride Distance
This visualization highlights which vehicle types are most utilized in terms of ride distance. For instance, longer bars for "Prime SUV" or "Bike" may indicate their popularity for longer commutes or quick, short-distance trips. Businesses can leverage this information to allocate resources effectively and optimise vehicle availability based on demand trends.
Vehicle Type Analysis3. Revenue by Payment Method
1. Revenue by Payment Method (Stacked Bar Chart)
A stacked bar chart categorizes total revenue by payment methods such as Cash, UPI and Credit Card. Each bar segment represents the contribution of a specific payment method to the overall revenue.
This visualization highlights the most popular payment methods and their impact on revenue by helping assess customer preferences and streamline payment processes.
2. Customer Spending Leaderboard:
A leaderboard visual ranks customers based on their total spending on bookings by listing the top contributors to revenue.
This helps identify high-value customers who contribute significantly to revenue, enabling targeted promotions or loyalty rewards to retain them.
3. Ride Distance Distribution (Histogram or Scatter Plot)
A histogram shows the frequency distribution of ride distances over different dates, while a scatter plot can depict individual ride distances against dates.
This visualization provides insights into ride patterns, including peak travel days and the variability of ride distances by helping in demand forecasting and route optimization.
Revenue Analysis4. Reason for Cancelling Ride Either by Customer or Driver
A pie chart is used to visually represent the most prominent reasons why customers and drivers cancel rides.
The chart segments the data into proportions by highlighting which reason has the highest occurrence.
For customers, the leading reason for cancellations is often "Driver is not moving towards pickup location" reflecting potential delays or miscommunication.
For drivers, "Personal & Car related issues" is typically the top reason, showcasing operational challenges.
Cncellation Analysis5. Customer Rating and Driver Rating
Driver Ratings: These reflect customer satisfaction with the driver, including professionalism, driving skills, and overall service quality. High ratings indicate positive experiences, while low ratings highlight areas for improvement.
Customer Ratings: These are provided by drivers to evaluate the behavior and cooperation of customers during the ride. They help maintain service standards and identify problematic behaviors.
Rating AnalysisKey Insights and Learnings
The below Power BI dashboard provides a quick overview of key performance metrics, including KPIs such as total bookings, total revenue, and total distance travelled, all of which are dynamically updated through a date slicer.
Users can easily filter data by date to make informed decisions based on the selected time period. The dashboard features card visualizations for each metric and a line chart to visualize trends over time, helping users track performance and spot patterns efficiently.
Dashboard Through this project, we derived several key insights about Bengaluru’s ride-hailing data:
- Booking Status Trends: Success rates were consistent but cancellations by customers peaked during certain times of the day, particularly during bad weather.
- High Demand on Weekends and Match Days: The data confirmed higher booking volumes during weekends and on match days, which could influence ride-hailing strategies for these days.
- Popular Vehicle Types: Autos and Prime Sedans saw the highest demand while eBikes were less frequent but growing in popularity.
- Cancellation Insights: The most common reasons for cancellations were "Driver is not moving towards pickup location" and "Change of plans."
- Incomplete Rides: Vehicle breakdowns were the leading cause of incomplete rides, followed by customer demands.
Conclusion
Overall, This data analysis project provided a comprehensive look into Bengaluru’s ride-hailing patterns, using SQL, Power BI, and Excel to handle large datasets and derive actionable insights. By adhering to the given constraints and using the appropriate tools, I was able to analyze booking trends, cancellations, ratings, and other key metrics.
These insights can help ride-hailing companies like OLA better understand customer behaviour, optimize driver assignments, and improve customer satisfaction. The project was also an excellent opportunity to strengthen your skills in SQL data management, Power BI visualization and Excel data manipulation.
For those interested in seeing the detailed analysis, I’ve uploaded the entire project, including the dataset and visualizations, to my GitHub repository. Feel free to check it out for more in-depth exploration and learning.
Similar Reads
SQL Tutorial Structured Query Language (SQL) is the standard language used to interact with relational databases. Mainly used to manage data. Whether you want to create, delete, update or read data, SQL provides the structure and commands to perform these operations. Widely supported across various database syst
8 min read
Basics
What is SQL?Structured Query Language (SQL) is the standard language used to interact with relational databases. Allows users to store, retrieve, update, and manage data efficiently through simple commands. Known for its user-friendly syntax and powerful capabilities, SQL is widely used across industries.How Do
6 min read
SQL Data TypesIn SQL, every column in a table must be defined with a data type, which specifies what kind of data it can store such as integers, dates, text or binary values. These types are fundamental to how databases store, retrieve, validate and manipulate data efficiently.Choosing the right data type is crit
3 min read
SQL OperatorsSQL operators are symbols or keywords used to perform operations on data in SQL queries. These operations can include mathematical calculations, data comparisons, logical manipulations, other data-processing tasks. Operators help in filtering, calculating, and updating data in databases, making them
5 min read
SQL Commands | DDL, DQL, DML, DCL and TCL CommandsSQL commands are the fundamental building blocks for communicating with a database management system (DBMS). It is used to interact with the database with some operations. It is also used to perform specific tasks, functions, and queries of data. SQL can perform various tasks like creating a table,
7 min read
SQL Database OperationsSQL databases or relational databases are widely used for storing, managing and organizing structured data in a tabular format. These databases store data in tables consisting of rows and columns. SQL is the standard programming language used to interact with these databases. It enables users to cre
3 min read
SQL CREATE TABLECreating a table is one of the first steps in building a database. The CREATE TABLE command in SQL helps define how your data will be stored, including the table name, column names, data types, and rules (constraints) such as NOT NULL, PRIMARY KEY, and CHECK.Whether you are storing customer details,
4 min read
Queries & Operations
SQL SELECT QuerySQL SELECT is used to fetch or retrieve data from a database. It can fetch all the data from a table or return specific results based on specified conditions. The data returned is stored in a result table. The SELECT clause is the first and one of the last components evaluated in the SQL query proce
3 min read
SQL INSERT INTO StatementThe INSERT INTO statement in SQL is used to add new rows of data into an existing table. Essential command for inserting records like customer data, employee records, or student information. SQL offers multiple ways to insert data depending on your requirement, whether it is for all columns, specifi
4 min read
SQL UPDATE StatementThe UPDATE statement in SQL is used to modify the data of an existing record in a database table. We can update single or multiple columns in a single query using the UPDATE statement as per our requirement. Whether you need to correct data, change values based on certain conditions, or update multi
4 min read
SQL DELETE StatementThe SQL DELETE statement is an essential command in SQL used to remove one or more rows from a database table. Unlike the DROP statement, which removes the entire table, the DELETE statement removes data (rows) from the table retaining only the table structure, constraints and schema. Whether you ne
3 min read
SQL | WHERE ClauseIn SQL, the WHERE clause is used to filter rows based on specific conditions. Whether you are retrieving, updating, or deleting data, WHERE ensures that only relevant records are affected. Without it, your query applies to every row in the table! The WHERE clause helps you:Filter rows that meet cert
3 min read
SQL | AliasesIn SQL, aliases are temporary names assigned to columns or tables to improve readability and simplify complex queries. It does not change the actual table or column name in the databaseâit's just for that one query. It is used when the name of a column or table is used other than its original name,
3 min read
SQL Joins & Functions
SQL Joins (Inner, Left, Right and Full Join)SQL joins are fundamental tools for combining data from multiple tables in relational databases. For example, consider two tables where one table (say Student) has student information with id as a key and other table (say Marks) has information about marks of every student id. Now to display the mar
4 min read
SQL CROSS JOINIn SQL, the CROSS JOIN is a unique join operation that returns the Cartesian product of two or more tables. This means it matches each row from the left table with every row from the right table, resulting in a combination of all possible pairs of records. In this article, we will learn the CROSS JO
3 min read
SQL | Date Functions (Set-1)SQL Date Functions are essential for managing and manipulating date and time values in SQL databases. They provide tools to perform operations such as calculating date differences, retrieving current dates and times and formatting dates. From tracking sales trends to calculating project deadlines, w
5 min read
SQL | String functionsSQL String Functions are powerful tools that allow us to manipulate, format, and extract specific parts of text data in our database. These functions are essential for tasks like cleaning up data, comparing strings, and combining text fields. Whether we're working with names, addresses, or any form
7 min read
Data Constraints & Aggregate Functions
SQL NOT NULL ConstraintIn SQL, constraints are used to enforce rules on data, ensuring the accuracy, consistency, and integrity of the data stored in a database. One of the most commonly used constraints is the NOT NULL constraint, which ensures that a column cannot have NULL values. This is important for maintaining data
3 min read
SQL PRIMARY KEY ConstraintThe PRIMARY KEY constraint in SQL is one of the most important constraints used to ensure data integrity in a database table. A primary key uniquely identifies each record in a table, preventing duplicate or NULL values in the specified column(s). Understanding how to properly implement and use the
5 min read
SQL Count() FunctionIn the world of SQL, data analysis often requires us to get counts of rows or unique values. The COUNT() function is a powerful tool that helps us perform this task. Whether we are counting all rows in a table, counting rows based on a specific condition, or even counting unique values, the COUNT()
7 min read
SQL SUM() FunctionThe SUM() function in SQL is one of the most commonly used aggregate functions. It allows us to calculate the total sum of a numeric column, making it essential for reporting and data analysis tasks. Whether we're working with sales data, financial figures, or any other numeric information, the SUM(
5 min read
SQL MAX() FunctionThe MAX() function in SQL is a powerful aggregate function used to retrieve the maximum (highest) value from a specified column in a table. It is commonly employed for analyzing data to identify the largest numeric value, the latest date, or other maximum values in various datasets. The MAX() functi
4 min read
AVG() Function in SQLSQL is an RDBMS system in which SQL functions become very essential to provide us with primary data insights. One of the most important functions is called AVG() and is particularly useful for the calculation of averages within datasets. In this, we will learn about the AVG() function, and its synta
4 min read
Advanced SQL Topics
SQL SubqueryA subquery in SQL is a query nested within another SQL query. It allows you to perform complex filtering, aggregation, and data manipulation by using the result of one query inside another. Subqueries are often found in the WHERE, HAVING, or FROM clauses and are supported in SELECT, INSERT, UPDATE,
5 min read
Window Functions in SQLSQL window functions are essential for advanced data analysis and database management. It is a type of function that allows us to perform calculations across a specific set of rows related to the current row. These calculations happen within a defined window of data and they are particularly useful
6 min read
SQL Stored ProceduresStored procedures are precompiled SQL statements that are stored in the database and can be executed as a single unit. SQL Stored Procedures are a powerful feature in database management systems (DBMS) that allow developers to encapsulate SQL code and business logic. When executed, they can accept i
7 min read
SQL TriggersA trigger is a stored procedure in adatabase that automatically invokes whenever a special event in the database occurs. By using SQL triggers, developers can automate tasks, ensure data consistency, and keep accurate records of database activities. For example, a trigger can be invoked when a row i
7 min read
SQL Performance TuningSQL performance tuning is an essential aspect of database management that helps improve the efficiency of SQL queries and ensures that database systems run smoothly. Properly tuned queries execute faster, reducing response times and minimizing the load on the serverIn this article, we'll discuss var
8 min read
SQL TRANSACTIONSSQL transactions are essential for ensuring data integrity and consistency in relational databases. Transactions allow for a group of SQL operations to be executed as a single unit, ensuring that either all the operations succeed or none of them do. Transactions allow us to group SQL operations into
8 min read
Database Design & Security
Introduction of ER ModelThe Entity-Relationship Model (ER Model) is a conceptual model for designing a databases. This model represents the logical structure of a database, including entities, their attributes and relationships between them. Entity: An objects that is stored as data such as Student, Course or Company.Attri
10 min read
Introduction to Database NormalizationNormalization is an important process in database design that helps improve the database's efficiency, consistency, and accuracy. It makes it easier to manage and maintain the data and ensures that the database is adaptable to changing business needs.Database normalization is the process of organizi
6 min read
SQL InjectionSQL Injection is a security flaw in web applications where attackers insert harmful SQL code through user inputs. This can allow them to access sensitive data, change database contents or even take control of the system. It's important to know about SQL Injection to keep web applications secure.In t
7 min read
SQL Data EncryptionIn todayâs digital era, data security is more critical than ever, especially for organizations storing the personal details of their customers in their database. SQL Data Encryption aims to safeguard unauthorized access to data, ensuring that even if a breach occurs, the information remains unreadab
5 min read
SQL BackupIn SQL Server, a backup, or data backup is a copy of computer data that is created and stored in a different location so that it can be used to recover the original in the event of a data loss. To create a full database backup, the below methods could be used : 1. Using the SQL Server Management Stu
4 min read
What is Object-Relational Mapping (ORM) in DBMS?Object-relational mapping (ORM) is a key concept in the field of Database Management Systems (DBMS), addressing the bridge between the object-oriented programming approach and relational databases. ORM is critical in data interaction simplification, code optimization, and smooth blending of applicat
7 min read