Sections Revision Part 2
Sections Revision Part 2
− Data Visualization
DATA MINING
Data Visualization
• is the presentation of data in graphical format.
• It helps people understand the significance of data by summarizing and
presenting huge amount of data in a simple and easy-to-understand format and
helps communicate information clearly and effectively.
Plots
− Histogram
− Box Plot
− Bar Plot
− Column Chart
− Pie Chart
− Scatter Plot
− Line Chart
− Violin Plot
− Density Plot
− WordCloud
− Heat Map
HISTOGRAM
• A histogram is an accurate graphical
representation of the distribution of a numeric
variable. It takes as input numeric variables
only.
• The variable is cut into several bins, and the
number of observation per bin is represented by
the height of the bar.
MADE BY π P a g e |2
DATA MINING
show plot
− plt.show()
Boxplot
• is probably one of the most common type of graphic. It gives a nice summary of
one or several numeric variables. The line that
divides the box into 2 parts represents
the median of the data.
• The end of the box shows the upper and
lower quartiles.
• the extreme lines shows the highest and lowest
value excluding outliers.
MADE BY π P a g e |3
DATA MINING
Barplot
• A barplot (or barchart) is one of the most common types of graphic.
• It shows the relationship between a numeric and a categoric variable.
• Each entity of the categoric variable is represented as a bar.
• The size of the bar represents its numeric value.
Create bars
− plt.bar(bars, frequancy)
Show graphic
− plt.show()
MADE BY π P a g e |4
DATA MINING
Column Chart
• A column chart is used to show a
comparison among different
attributes, or it can show a
comparison of items over time.
− df.plot.bar()
− plt.show()
pie chart
• A pie chart shows a static number and how categories represent part of a
whole the composition of something.
• A pie chart represents numbers in
percentages, and the total sum of all
segments needs to equal 100%.
− plt.pie(df['Income'], labels =
df['EMPID'], autopct ='% 1.2f %%')
− plt.show()
scatter chart
• A scatter chart shows the relationship between
two different variables and it can reveal the
distribution trends.
• It should be used when there are many different
data points, and you want to highlight similarities
in the data set.
MADE BY π P a g e |5
DATA MINING
• This is useful when looking for outliers and for understanding the distribution
of your data.
violin plot
• A violin plot can be used to display the
distribution of the data and its probability
density.
• Furthermore, we get a visualization of the
mean of the data (white dot in the center of the
box plot, in the image below)
MADE BY π P a g e |6
DATA MINING
density plot
• A density plot shows the distribution of a
numerical variable. It takes only set of numeric
values as input. It is really close to a histogram.
− sns.kdeplot(df['Sales'])
− plt.show()
− plt.figure(figsize=(12, 8))
− sns.heatmap(df[['Age','Income', 'Sales']])
− plt.show()
MADE BY π P a g e |7