0% found this document useful (0 votes)
5 views3 pages

How To Create Data Analytics Slides

The document presents an exploratory data analysis (EDA) highlighting sales trends across various product categories, revealing that technology products, particularly phones, have the highest sales. It also discusses seasonal sales patterns, indicating peaks during the last quarter and specific months like November and December. Additionally, the document covers data preprocessing steps, including balancing datasets through oversampling and undersampling techniques to improve model performance.

Uploaded by

Unicorn Spider
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views3 pages

How To Create Data Analytics Slides

The document presents an exploratory data analysis (EDA) highlighting sales trends across various product categories, revealing that technology products, particularly phones, have the highest sales. It also discusses seasonal sales patterns, indicating peaks during the last quarter and specific months like November and December. Additionally, the document covers data preprocessing steps, including balancing datasets through oversampling and undersampling techniques to improve model performance.

Uploaded by

Unicorn Spider
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

EXPLORATORY DATA ANALYSIS

• We have also performed the category & product wise split up in fig(a)
• From EDA we have shown weekly sales graph in fig(b) which depicts the “Seasonality factors in sales”

CATEGORY BREAK-UP OF PRODUCTS fig(a)

• Chairs, Storage and Phones


SEASONALITY FACTORS IN SALES
are having the highest sale in There is general trend of high sales in the last quarter of the year
the Furniture, Office Supply
and Technology category During the festive season, sales peak in month of Nov & Dec
respectively.

o Chairs : $15,01,682
Due to summer discount season, there is peak in June of every year
o Storage : $11,26,813
o Phones : $17,06,824 Sales are at their lowest in a particular year in the month of July

• Technology segment has the Q3 & Q4 have more sales compared to Q1 & Q2
biggest chunk in sales of
$47,44,557
SEASONALITY FACTORS IN SALES fig(b)
VIDEO DISPLAYING DATAPOINTS IN 3D FIELD
The video shows the groups
plotted in 3D space where they
are clustered in 5 regions
• X-axis : Profit
• Y-axis : Sales
• Z-axis : Region
They are clustered on the basis
of RFM groups which we will
discuss later.
EXPLORATORY DATA ANALYSIS
VISUALIZING THE DATA AND HIGHLIGHTING THE STRIKING INSIGHTS

1 Ideal Duration 2 Price of Courses 3 Preferred Programs

Chart displaying duration of completed courses Chord Chart revealing Price ranges of courses Force Directed Chart revealing Preferred course subjects

MOST COMPLETED COURSES WERE BELOW MOST FREQUENT PURCHASES FOR COURSES HIGHEST PURCHASES MADE FOR MARKETING,
15 HOURS IN AVERAGE DURATION WORTH RS. 1,000 – RS. 5,000 FINANCE, IT AND HR COURSES

• An exploratory data analysis revealed that • To increase purchases company must release • The Force Directed Chart shows Marketing &
highest percentage of completed courses were in courses in range of Rs. 1,000 to Rs. 5,000 Technical courses most preferred
range of 10-15 hours
• Most customers willing to spend maximum of Rs. • Customer survey analysis revealed few
• Highest percentage of purchased courses, had 8,000 on courses but not more as many did not purchasers interested in courses outside their
average duration of less than 5 hours purchase courses above Rs. 8,000 field of interest or scope of job
DATA-PREPROCESSING
COMBINED DATA FROM VARIOUS FILES TO CREATE A SINGLE DATASET

1 MAJOR STEPS INVOLVED 2 BALANCING THE DATASET


Dropped irrelevant/repeated columns e.g. “Over 18”, “Std Hours”.
UNDERSAMPLING Data 0’s 1’s Total

Imputed Data
3,605 695 4,300
(Unbalanced)
Oversampled
3,605 3,605 7,210
Data
Undersampled
695 695 1,390
Data

UNDERSAMPLING & OVERSAMPLING


TO BALANCE THE DATASET
OVERSAMPLING

• Perform Label Encoding to convert ordinal data into Interval data • We balanced the dataset (attrition) using Imblearn’s -
Oversampling (RandomSampler) & Undersampling (N-v2)
• Multi-collinearity check & VIF to remove variables with high degree of
correlation among them which are dropped • Models performed good for under sampled data

• OLS logit regression performed to find statistically significant variables • Other methods (undersampling): condensed nearest neighbor,
(p-value < 5%) which are dropped nearmiss v1, v2, v3 were tried & nearmiss v2 performed best

• Data complexity is reduced by reducing the number of independent • Scaling numerical column decreases the spread & variance and
variables from 25 to 13 increase the computational efficiency for later models

You might also like