0% found this document useful (0 votes)

30 views7 pages

Group 03 BI Assignment

This document summarizes a group project analyzing used vehicle sales data from Craigslist. The group extracted data from the Austin Craigslist Cars and Trucks dataset, which contains information on vehicles listed in that area. They preprocessed the data by dropping unnecessary columns, identifying and removing columns with many null values, separating data into categorical and numerical types, and filtering the data. Visualizations were then created in Tableau to develop a dashboard presenting descriptive analytics insights from the cleaned data.

Uploaded by

Jay Rajapaksha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views7 pages

Group 03 BI Assignment

Uploaded by

Jay Rajapaksha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Bachelor of Business Science Degree Program

Year IV – Semester VIII

DA 4120 – Business Intelligence

Group Assignment
Reselling of used vehicles analysis

Submitted by:
D. D. R. R. Amaranath 196003T
J. D. S. K. Rajapaksha 196070T
K. D. R. Rajapakshe 196072C
K. S. Wishmitha 196095A

Supervisory Lecturer
Mr. Maninda Edirisooriya
Introduction

In today's digital age, the used vehicle trading industry has experienced significant growth,
providing consumers with numerous options to buy and sell used vehicles. Among the various
platforms facilitating these transactions, Craigslist has emerged as a prominent player. This
report aims to provide an analysis of Craigslist as a competitor in the used vehicle trading
market.

By leveraging big data analysis techniques, we have delved into various aspects of Craigslist's
operations, aiming to gain insights into its market presence, user behavior, and potential impact
on the used vehicle trading industry. Our analysis considers factors such as the volume and
variety of vehicle listings, pricing trends, vehicle condition etc.

By examining these factors, we aim to provide a comprehensive understanding of Craigslist's

positioning and competitive advantage in the used vehicle trading market. This analysis will
assist our company in identifying potential strategies and areas of improvement to enhance our
own operations and compete effectively in the market.
Dataset

The Austin Reese Craigslist Cars and Trucks dataset including 26 columns provides a
comprehensive overview of the automotive market. By looking closely at the dataset's columns,
we can learn important things about the cars being listed in this area. These insights help us
understand different aspects of the car market in this region.

The columns in this dataset cover a lot of information about the vehicles, such as their make,
model, year, condition, price, location, and other attributes.

Exploring some columns of the dataset

Make and Model: Tell us the brand and specific model of each vehicle.

Year: This column shows the manufacturing year of the vehicles.

Condition: Provides insights into whether the vehicles are new, used, or fall into specific
categories like excellent, good, fair, or salvage.

Price: Displays the listed prices for each vehicle.

Location: This column tells us where the vehicles are listed in Austin. This gives us geographic
insights, showing us how the listings are spread across different neighborhoods and indicating
which types of vehicles are more common in specific areas.

Mileage: The mileage column shows the distance traveled by each vehicle. This is important
information for buyers, as it helps them gauge the wear and tear on a particular vehicle.

Fuel Type: This column reveals the type of fuel used by each vehicle, such as gasoline, diesel,
hybrid, or electric. It allows us to examine the popularity of different fuel types and understand
the preferences of potential buyers.

Transmission: The transmission column tells us whether the vehicles have automatic or manual
transmission. By analyzing this data, we can see how the distribution of transmission types
relates to other variables in the dataset.
Title Status: The title status column provides information about the legal status of each vehicle's
title, including designations like clean title, salvage title, or rebuilt title. This helps us understand
the history and condition of the vehicles in the dataset.

Link to download the dataset

Creating the environment

Here we used Google Cloud Platform for our big data analysis as it is very useful to manage
large amounts of data efficiently. The following paragraph describes how we create our
environment and do our analysis.

First, we created a bucket in GCP to store our dataset and python scripts. After storing the
necessary files next, we have to create the cluster, these are the specifications that that required to
link the files.

We created the cluster environment using 4CPUs and 16GB of Memory with 50GB of SSD for
employee node and the worker node with 2CPUs and 8GB of memory with 50GB SSD.

In the meantime, we generated a sample dataset using our original dataset by randomly sampling,
then we create the python script of the preprocessing tasks that should apply to our dataset.

After the cluster is created, we linked our python script to the main dataset to apply the
preprocessing tasks and submit the job. After the preprocessed task have done, the generated
cleaned dataset csv has store in Big Query. We used Big Query here because we can directly
connect into our data visualization environment called Tableau.

Finally, the data visualizations has done in the Tableau environment by linking the Big Query to
Tableau environment.

The following chapters explained more about the preprocessing tasks as well as data
visualizations that applied.
Preprocessing

The preprocessing for the data was done using PySpark. After importing the relevant PySpark
libraries and functions, a SparkSession was created as it makes way to communicate with Spark
and to ensure an uninterrupted execution of preprocessing. With that it allowed us to use
distributed computing capabilities for data processing as well as conducting analysis.

Dropping of columns

The first step of preprocessing used for the data was to drop 10 columns that contained
unnecessary details such as region URL, image URL, State, and VIN. As these data columns
would not add any insights to the analysis these columns were eliminated from the dataset.

Upon dropping these columns, the next step of preprocessing was to identify columns with more
than 50% of null values and all zero. Two columns, size, and country, were identified to have
more than the defined null values limit and these identified columns were also dropped from the
data set.

Splitting data as categorical and numerical

The 3rd step of preprocessing relating to the data was to separate them as categorical and
numerical data.
The separated numerical data were checked to see if more than 10% of null values existed and
were dropped. And for the numerical columns with less than 10%, the null values are replaced

with the median value of that column.

Similarly, the categorical columns were checked for null values of less than 10% and they were
replaced by the mode of the column.

Filtering

In this step of preprocessing, the region column was filtered with the condition of not equal to
low miles. And year, price, and odometer columns were cast as float.

Once the preprocessing was completed the final dataset was taken as CSV file.

Visualizations

Effective visualization of data plays a significant role when presenting analytical findings to a
broad audience. It makes the content understandable to individuals with different domains and
gives the users the ability gain insights easily.

Tableau was the visual analytics platform used for visualizing the findings about the used cars.
The obtained CSV file after the preprocessing was connected to tableau using Big Query.
Thereafter using suitable visualizations a dashboard was developed to get descriptive analytics
about the dataset.

Car Price Prediction
67% (3)
Car Price Prediction
54 pages
Data Analytics Project PDF
No ratings yet
Data Analytics Project PDF
10 pages
Used Car Market (2023 - 2028)
No ratings yet
Used Car Market (2023 - 2028)
43 pages
New Wheels
100% (1)
New Wheels
29 pages
Possible Analyses of The Used Car Sales Dataset: Swipe Next
No ratings yet
Possible Analyses of The Used Car Sales Dataset: Swipe Next
8 pages
Table of Contents
No ratings yet
Table of Contents
19 pages
USA Second Hand Car: Project Report
No ratings yet
USA Second Hand Car: Project Report
24 pages
Preowned Vehicle Price Analyzer Project
No ratings yet
Preowned Vehicle Price Analyzer Project
26 pages
Big Data For Managers: Assignment 1
No ratings yet
Big Data For Managers: Assignment 1
8 pages
Cars4u Project: Proprietary Content. © Great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
100% (2)
Cars4u Project: Proprietary Content. © Great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
30 pages
Ikram 2
No ratings yet
Ikram 2
8 pages
Output and Analysis
No ratings yet
Output and Analysis
10 pages
Internship
No ratings yet
Internship
23 pages
VEHICLE SALES AND MARKET TRENDS DATASET - Python For Data Analysis - MBA
No ratings yet
VEHICLE SALES AND MARKET TRENDS DATASET - Python For Data Analysis - MBA
45 pages
A Study On Consumer Buying Behavior Towards Purchase of Used Car's
No ratings yet
A Study On Consumer Buying Behavior Towards Purchase of Used Car's
85 pages
Completed Project Final
No ratings yet
Completed Project Final
63 pages
Running Business Model - CARS-24
No ratings yet
Running Business Model - CARS-24
59 pages
Updated (1) Project Proposal Case Study
No ratings yet
Updated (1) Project Proposal Case Study
5 pages
Ajay and Saurabh
No ratings yet
Ajay and Saurabh
16 pages
Analytics Projectv 1
No ratings yet
Analytics Projectv 1
16 pages
Analysis of Old Cars Data
No ratings yet
Analysis of Old Cars Data
32 pages
Tushar Jaiswal 809 PDF
No ratings yet
Tushar Jaiswal 809 PDF
26 pages
Pre-Owned Car Price Prediction Using Machine Learning Techniques
No ratings yet
Pre-Owned Car Price Prediction Using Machine Learning Techniques
5 pages
Report
No ratings yet
Report
47 pages
Amit Khilare SQL Project
No ratings yet
Amit Khilare SQL Project
13 pages
Iop 30.7
No ratings yet
Iop 30.7
26 pages
Markeghhhas
No ratings yet
Markeghhhas
39 pages
A Study On Used Cars Price Prediction Using Regression Model With Reference To
No ratings yet
A Study On Used Cars Price Prediction Using Regression Model With Reference To
8 pages
Project - Analyzing The Impact of Car Features On Price and Profitability
No ratings yet
Project - Analyzing The Impact of Car Features On Price and Profitability
8 pages
Ip Project
No ratings yet
Ip Project
52 pages
ML Project Paper
No ratings yet
ML Project Paper
11 pages
Used Cars Tableau Analytics
No ratings yet
Used Cars Tableau Analytics
9 pages
Analyzing Car Market Trends and Pricing Insights Using Python
No ratings yet
Analyzing Car Market Trends and Pricing Insights Using Python
9 pages
Sample Project - IP - 12
No ratings yet
Sample Project - IP - 12
14 pages
Simulation Brief-BDAI 02
No ratings yet
Simulation Brief-BDAI 02
12 pages
Amit - Khilare - SQL Project
No ratings yet
Amit - Khilare - SQL Project
13 pages
A Study On The Satisfaction of Used Car With Special References To Thiruvananthapuram
No ratings yet
A Study On The Satisfaction of Used Car With Special References To Thiruvananthapuram
14 pages
Hadoop Hive - One
No ratings yet
Hadoop Hive - One
10 pages
Synopsis of Automotive Dealerships
No ratings yet
Synopsis of Automotive Dealerships
3 pages
74 Ijcse2018 19
No ratings yet
74 Ijcse2018 19
7 pages
Group 10 Sec C SeSS Project
No ratings yet
Group 10 Sec C SeSS Project
17 pages
Project - Analyzing The Impact of Car Features On Price and Profitability
No ratings yet
Project - Analyzing The Impact of Car Features On Price and Profitability
8 pages
Project K Updated
No ratings yet
Project K Updated
19 pages
Hello Connection
No ratings yet
Hello Connection
2 pages
Impact of Car Features
No ratings yet
Impact of Car Features
9 pages
Data BI
No ratings yet
Data BI
8 pages
Reference For Project
No ratings yet
Reference For Project
4 pages
Title: Descriptive Analysis & Visualization of Real Data: A Study On Analysis of Car Sales in Norway
No ratings yet
Title: Descriptive Analysis & Visualization of Real Data: A Study On Analysis of Car Sales in Norway
16 pages
Car Features Case Study
No ratings yet
Car Features Case Study
10 pages
Second Hand Car Price Prediction
No ratings yet
Second Hand Car Price Prediction
18 pages
Assignmentlasst
No ratings yet
Assignmentlasst
43 pages
Microsoft Azure Interview Questions by Kunal and Vikram v1.0
100% (5)
Microsoft Azure Interview Questions by Kunal and Vikram v1.0
25 pages
Document 3 (1) Sai Kumar
No ratings yet
Document 3 (1) Sai Kumar
13 pages
SPM Project Report1
No ratings yet
SPM Project Report1
10 pages
Plag
No ratings yet
Plag
3 pages
Executive Summary: Industry Analysis and Development of Supply Chain For Mahindra First Choice
No ratings yet
Executive Summary: Industry Analysis and Development of Supply Chain For Mahindra First Choice
39 pages
Mahindra Capstone
No ratings yet
Mahindra Capstone
5 pages
Vehsales System That Provides A Car Selling Platform For User To View and Manage Car Sales Online Using This System
No ratings yet
Vehsales System That Provides A Car Selling Platform For User To View and Manage Car Sales Online Using This System
2 pages
ScriptDumper Lua
100% (2)
ScriptDumper Lua
7 pages
Weekly Diary Report-244
No ratings yet
Weekly Diary Report-244
9 pages
Web Vehicle Agent Document
No ratings yet
Web Vehicle Agent Document
9 pages
Market Outlook
No ratings yet
Market Outlook
13 pages
Practical Implementation of Stack Using List
No ratings yet
Practical Implementation of Stack Using List
6 pages
Informatica - Question - Answer: Deleting Duplicate Row Using Informatica
No ratings yet
Informatica - Question - Answer: Deleting Duplicate Row Using Informatica
113 pages
Veeam Certified Engineer 2021 VMCE2021 Exam Dumps
No ratings yet
Veeam Certified Engineer 2021 VMCE2021 Exam Dumps
11 pages
Using A Name Space
No ratings yet
Using A Name Space
248 pages
ASE-to-ASE Replication Quick Start Guide: SAP Replication Server 15.7.1 SP301 Document Version: 1.0 - 2015-09-30
No ratings yet
ASE-to-ASE Replication Quick Start Guide: SAP Replication Server 15.7.1 SP301 Document Version: 1.0 - 2015-09-30
98 pages
Mysql 2. Oracle 3. Microsoft SQL Server
No ratings yet
Mysql 2. Oracle 3. Microsoft SQL Server
11 pages
FPGA Implementation of Digital Controller For DC-DC Buck Converter
No ratings yet
FPGA Implementation of Digital Controller For DC-DC Buck Converter
6 pages
TSM Backup Retention Policies
No ratings yet
TSM Backup Retention Policies
4 pages
Project Report: Master of Computer Application
No ratings yet
Project Report: Master of Computer Application
82 pages
Deadlock Prevention and Avoidance-1
No ratings yet
Deadlock Prevention and Avoidance-1
2 pages
Dequeue
No ratings yet
Dequeue
5 pages
IAT-III Question Paper With Solution of BCS303 Operating Systems March-2024-Attar Mahay Sheetal
No ratings yet
IAT-III Question Paper With Solution of BCS303 Operating Systems March-2024-Attar Mahay Sheetal
13 pages
Sports Data Sources and
No ratings yet
Sports Data Sources and
23 pages
Java Questions and Answers
No ratings yet
Java Questions and Answers
40 pages
Unit-2 Modern Data Ecosystem
No ratings yet
Unit-2 Modern Data Ecosystem
3 pages
CDSS Day-3
No ratings yet
CDSS Day-3
207 pages
Advanced Supply Chain Planning Setup Steps
No ratings yet
Advanced Supply Chain Planning Setup Steps
98 pages
COLA-070071 - Unit 04 - Database Design and Development
No ratings yet
COLA-070071 - Unit 04 - Database Design and Development
86 pages
Assignment Report
No ratings yet
Assignment Report
23 pages
Ice Info
No ratings yet
Ice Info
2 pages
Lecture 3.3.1 File Organization
No ratings yet
Lecture 3.3.1 File Organization
13 pages
QLWA - MH60659 - Containment Sumps, Fittings and Accessories For Fuels
No ratings yet
QLWA - MH60659 - Containment Sumps, Fittings and Accessories For Fuels
1 page
Log
No ratings yet
Log
8 pages
Coding Activity 3.ipynb - Colaboratory
No ratings yet
Coding Activity 3.ipynb - Colaboratory
7 pages
Unit Ii
No ratings yet
Unit Ii
9 pages
Destiny Kids Blueprint Edited
No ratings yet
Destiny Kids Blueprint Edited
2 pages
AD Project 2023 2024
No ratings yet
AD Project 2023 2024
5 pages
Adnan CV
No ratings yet
Adnan CV
3 pages
Tutorial 3
No ratings yet
Tutorial 3
2 pages
Steps of NDVI Calculation
No ratings yet
Steps of NDVI Calculation
2 pages
Yuvraj's Resume
No ratings yet
Yuvraj's Resume
1 page

Group 03 BI Assignment

Uploaded by

Group 03 BI Assignment

Uploaded by

Bachelor of Business Science Degree Program

Year IV – Semester VIII

DA 4120 – Business Intelligence

By examining these factors, we aim to provide a comprehensive understanding of Craigslist's

Exploring some columns of the dataset

Year: This column shows the manufacturing year of the vehicles.

Price: Displays the listed prices for each vehicle.

Link to download the dataset

Creating the environment

Splitting data as categorical and numerical

with the median value of that column.

You might also like