How To Create Data Analytics Slides
How To Create Data Analytics Slides
• We have also performed the category & product wise split up in fig(a)
• From EDA we have shown weekly sales graph in fig(b) which depicts the “Seasonality factors in sales”
o Chairs : $15,01,682
Due to summer discount season, there is peak in June of every year
o Storage : $11,26,813
o Phones : $17,06,824 Sales are at their lowest in a particular year in the month of July
• Technology segment has the Q3 & Q4 have more sales compared to Q1 & Q2
biggest chunk in sales of
$47,44,557
SEASONALITY FACTORS IN SALES fig(b)
VIDEO DISPLAYING DATAPOINTS IN 3D FIELD
The video shows the groups
plotted in 3D space where they
are clustered in 5 regions
• X-axis : Profit
• Y-axis : Sales
• Z-axis : Region
They are clustered on the basis
of RFM groups which we will
discuss later.
EXPLORATORY DATA ANALYSIS
VISUALIZING THE DATA AND HIGHLIGHTING THE STRIKING INSIGHTS
Chart displaying duration of completed courses Chord Chart revealing Price ranges of courses Force Directed Chart revealing Preferred course subjects
MOST COMPLETED COURSES WERE BELOW MOST FREQUENT PURCHASES FOR COURSES HIGHEST PURCHASES MADE FOR MARKETING,
15 HOURS IN AVERAGE DURATION WORTH RS. 1,000 – RS. 5,000 FINANCE, IT AND HR COURSES
• An exploratory data analysis revealed that • To increase purchases company must release • The Force Directed Chart shows Marketing &
highest percentage of completed courses were in courses in range of Rs. 1,000 to Rs. 5,000 Technical courses most preferred
range of 10-15 hours
• Most customers willing to spend maximum of Rs. • Customer survey analysis revealed few
• Highest percentage of purchased courses, had 8,000 on courses but not more as many did not purchasers interested in courses outside their
average duration of less than 5 hours purchase courses above Rs. 8,000 field of interest or scope of job
DATA-PREPROCESSING
COMBINED DATA FROM VARIOUS FILES TO CREATE A SINGLE DATASET
Imputed Data
3,605 695 4,300
(Unbalanced)
Oversampled
3,605 3,605 7,210
Data
Undersampled
695 695 1,390
Data
• Perform Label Encoding to convert ordinal data into Interval data • We balanced the dataset (attrition) using Imblearn’s -
Oversampling (RandomSampler) & Undersampling (N-v2)
• Multi-collinearity check & VIF to remove variables with high degree of
correlation among them which are dropped • Models performed good for under sampled data
• OLS logit regression performed to find statistically significant variables • Other methods (undersampling): condensed nearest neighbor,
(p-value < 5%) which are dropped nearmiss v1, v2, v3 were tried & nearmiss v2 performed best
• Data complexity is reduced by reducing the number of independent • Scaling numerical column decreases the spread & variance and
variables from 25 to 13 increase the computational efficiency for later models