0% found this document useful (0 votes)
14 views7 pages

Sections Revision Part 2

Uploaded by

georgeashraf503
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views7 pages

Sections Revision Part 2

Uploaded by

georgeashraf503
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

DATA MINING

Sections Revision Part 2

− Data Visualization
DATA MINING

Data Visualization
• is the presentation of data in graphical format.
• It helps people understand the significance of data by summarizing and
presenting huge amount of data in a simple and easy-to-understand format and
helps communicate information clearly and effectively.

Plots
− Histogram
− Box Plot
− Bar Plot
− Column Chart
− Pie Chart
− Scatter Plot
− Line Chart
− Violin Plot
− Density Plot
− WordCloud
− Heat Map

HISTOGRAM
• A histogram is an accurate graphical
representation of the distribution of a numeric
variable. It takes as input numeric variables
only.
• The variable is cut into several bins, and the
number of observation per bin is represented by
the height of the bar.

MADE BY π P a g e |2
DATA MINING

Implementation using matplotlib:


− import pandas as pd
− import matplotlib.pyplot as plt

create histogram for numeric data


− df.hist()

show plot
− plt.show()

Implementation using seaborn:


• Graphic library built on top of Matplotlib.
• It allows to make your charts prettier, and facilitates some of the common data
visualization needs

− pip install seaborn


− import seaborn as sns
− sns.distplot( df["Sales"] , bins=20 )

Boxplot
• is probably one of the most common type of graphic. It gives a nice summary of
one or several numeric variables. The line that
divides the box into 2 parts represents
the median of the data.
• The end of the box shows the upper and
lower quartiles.
• the extreme lines shows the highest and lowest
value excluding outliers.

MADE BY π P a g e |3
DATA MINING

For each numeric attribute of dataframe


− df.plot.box()
− plt.show()

individual attribute box plot


− plt.boxplot(df['Income'])
− plt.show()

Barplot
• A barplot (or barchart) is one of the most common types of graphic.
• It shows the relationship between a numeric and a categoric variable.
• Each entity of the categoric variable is represented as a bar.
• The size of the bar represents its numeric value.

Make a fake dataset:


− frequancy = [3, 12]
− bars = ('Male', 'Female')

Create bars
− plt.bar(bars, frequancy)

Create names on the x-axis


− plt.xticks(bars)

Show graphic
− plt.show()

Create horizontal bars


− plt.barh(bars,frequancy)

Create names on the y-axis


− plt.yticks(bars)

MADE BY π P a g e |4
DATA MINING

Column Chart
• A column chart is used to show a
comparison among different
attributes, or it can show a
comparison of items over time.

− df.plot.bar()
− plt.show()

pie chart
• A pie chart shows a static number and how categories represent part of a
whole the composition of something.
• A pie chart represents numbers in
percentages, and the total sum of all
segments needs to equal 100%.

− plt.pie(df['Income'], labels =
df['EMPID'], autopct ='% 1.2f %%')
− plt.show()

scatter chart
• A scatter chart shows the relationship between
two different variables and it can reveal the
distribution trends.
• It should be used when there are many different
data points, and you want to highlight similarities
in the data set.

MADE BY π P a g e |5
DATA MINING

• This is useful when looking for outliers and for understanding the distribution
of your data.

scatter plot between sales and age


− plt.scatter(df['Age'], df['Sales'])
− plt.show()

line chart or line graph


• A line chart or line graph is a type of chart which displays information as a
series of data points called ‘markers’ connected
by straight line segments.
• A line chart is often used to visualize a trend in
data over intervals of time.

− plt.plot( 'Age','Sales', data=df[['Age','Sales']],


color='skyblue', alpha=0.3 , linestyle='--' ,
linewidth=5)
− plt.show()

violin plot
• A violin plot can be used to display the
distribution of the data and its probability
density.
• Furthermore, we get a visualization of the
mean of the data (white dot in the center of the
box plot, in the image below)

− sns.violinplot(x="vs", y='wt', data=df0)

MADE BY π P a g e |6
DATA MINING

density plot
• A density plot shows the distribution of a
numerical variable. It takes only set of numeric
values as input. It is really close to a histogram.

− sns.kdeplot(df['Sales'])
− plt.show()

A Wordcloud (or Tag cloud)


• is a visual representation of text data.
• It displays a list of words, the importance of each
being shown with font size or color.
• This format is useful for quickly perceiving the most
prominent terms.

− pip install wordcloud


− from wordcloud import WordCloud

# Create the wordcloud object


− wordcloud = WordCloud(width=480, height=480, margin=0).generate(text)

heat map (or heatmap)


• A heat map (or heatmap) is a graphical
representation of data where the individual
values contained in a matrix are represented as
colors.

− plt.figure(figsize=(12, 8))
− sns.heatmap(df[['Age','Income', 'Sales']])
− plt.show()

MADE BY π P a g e |7

You might also like