0% found this document useful (0 votes)
39 views

Data Visualization - New

This document discusses various data visualization techniques in Python such as line plots, area plots, histograms, and bar charts. It provides code examples for reading data from Excel files and manipulating DataFrames. Methods like .plot(), .hist(), and .annotate() are used to generate the visualizations. Both the scripting layer and artist layer approaches in Matplotlib are covered.

Uploaded by

WHITE YT
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Data Visualization - New

This document discusses various data visualization techniques in Python such as line plots, area plots, histograms, and bar charts. It provides code examples for reading data from Excel files and manipulating DataFrames. Methods like .plot(), .hist(), and .annotate() are used to generate the visualizations. Both the scripting layer and artist layer approaches in Matplotlib are covered.

Uploaded by

WHITE YT
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Data Visualization – Python

Read Data from Excel:


- Import numpy as np
- Import pandas as pd
- From __future__ import print_function #adds compatibility to python 2
- !pip install xlrd
- Print('xlrd installed!')
- Df_can = pd.read_excel('https:.....", sheetname = 'Canada', skiprows = range(20), skip_footer = 2)

- .tolist() – pentru a transforma in lista o serie sau altceva


- .shape() – pentru a vedea marimea dataframe
- .isnull().sum() – pentru a vedea suma valorilor nule
- Df.loc[label] – filter by the labels of the index/column
- Df.iloc[index] – filter by the positions of the index
- Df_can.set_index('Country', inplace = True) – pentru a schimba index-ul
- Df_can.reset_index() – pentru a reseta index-ul
- print(df_can.loc['Japan', [1980, 1981, 1982, 1983, 1984, 1984]]) – loc pentru valori
- print(df_can.iloc[87, [3, 4, 5, 6, 7, 8]]) – iloc pentru index
- df_can.columns = list(map(str, df_can.columns)) - convert the column names into strings
- years = list(map(str, range(1980, 2014))) – lista cu anii de la 1980 pana in 2014
- df_can[(df_can['Continent']=='Asia') & (df_can['Region']=='Southern Asia')] -filtram
- # note: When using 'and' and 'or' operators, pandas requires we use '&' and '|' instead of
'and' and 'or'
- haiti.index = haiti.index.map(int) # let's change the index values of Haiti to type integer for
plotting

What is a line plot and why use it?


A line chart or line plot is a type of plot which displays information as a series of data points
called 'markers' connected by straight line segments. It is a basic type of chart common in
many fields. Use line plot when you have a continuous data set. These are best suited for
trend-based visualizations of data over a period of time.
- haiti.plot(kind='line')
- plt.title('Immigration from Haiti')
- plt.ylabel('Number of immigrants')
- plt.xlabel('Years')
- plt.show()
- # annotate the 2010 Earthquake.
- # syntax: plt.text(x, y, label)
- plt.text(2000, 6000, '2010 Earthquake') # see note below
- Since the x-axis (years) is type 'integer', we specified x as a year. The y axis (number of
immigrants) is type 'integer', so we can just specify the value y = 6000.
- plt.text(2000, 6000, '2010 Earthquake') # years stored as type int
- If the years were stored as type 'string', we would need to specify x as the index position of
the year. Eg 20th index is year 2000 since it is the 20th year with a base year of 1980.
- plt.text(20, 6000, '2010 Earthquake') # years stored as type int
- df_CI = df_CI.transpose() – pentru a modifica coloanele cu randurile.

AREA PLOT
- Also know as area chart or area graph.
- Commonly used to represent cumulated totals using numers or percentages over time.
- Is commonly used when trying to compare two or more quantities.

- Import matplotlib as mpl


- Import matplotlib.pyplot as plt
- Df_top5.plot(kind = 'area')
- Plt.title('Immigration trend of top 5 counteries')
- Plt.ylabel('Number of immigrants')
- Plt.xlabel('Years')
- Plt.show()

- df_top5.index = df_top5.index.map(int) # let's change the index values of df_top5 to type integer
for plotting
- df_top5.plot(kind='area', stacked=False,figsize=(20, 10), # pass a tuple (x, y) size)
- plt.title('Immigration Trend of Top 5 Countries')
- plt.ylabel('Number of Immigrants')
- plt.xlabel('Years')
- plt.show()

HISTOGRAMS
- Import matplotlib as mpl
- Import matplotlib.pyplot as plt
- Df_canada['2013'].plot(kind = 'hist', figsize = (10,6))
- Plt.title('Histogram of immigration from 195 countries in 2013')
- Plt.ylabel('Number of countries')
- Plt.xlabel('Number of immigrants')
- Plt.show()
BINS HISTOGRAMA

- Import matplotlib as mpl


- Import matplotlib.pyplot as plt
- Import numpy as np
- Count, bin_edges = np.histogram(df_canada['2013']) # 10 parti egale
- Df_canada['2013'].plot(kind = 'hist', xticks = bin_edges)
- Plt.title('Histogram of immigration from 195 countries in 2013')

- # np.histogram returns 2 values


- count, bin_edges = np.histogram(df_can['2013'])
- print(count) # frequency count
- print(bin_edges) # bin ranges, default = 10 bins

- # transpose dataframe
- df_t = df_can.loc[['Denmark', 'Norway', 'Sweden'], years].transpose()
- df_t.head()

- increase the bin size to 15 by passing in bins parameter


- set transparency to 60% by passing in alpha paramemter
- label the x-axis by passing in x-label paramater
- change the colors of the plots by passing in color parameter

- # let's get the x-tick values


- count, bin_edges = np.histogram(df_t, 15)
- # un-stacked histogram
- df_t.plot(kind ='hist', figsize=(10, 6),bins=15,alpha=0.6,xticks=bin_edges,color=['coral',
'darkslateblue', 'mediumseagreen'])
- plt.title('Histogram of Immigration from Denmark, Norway, and Sweden from 1980 - 2013')
- plt.ylabel('Number of Years')
- plt.xlabel('Number of Immigrants')
- plt.show()
BAR CHART

- To create a bar plot, we can pass one of two arguments via kind parameter in plot():
- kind=bar creates a vertical bar plot
- kind=barh creates a horizontal bar plot

Let's annotate this on the plot using the annotate method of the scripting layer or the pyplot
interface. We will pass in the following parameters:

- s: str, the text of annotation.


- xy: Tuple specifying the (x,y) point to annotate (in this case, end point of arrow).
- xytext: Tuple specifying the (x,y) point to place the text (in this case, start point of arrow).
- xycoords: The coordinate system that xy is given in - 'data' uses the coordinate system of the
object being annotated (default).
- arrowprops: Takes a dictionary of properties to draw the arrow:
- arrowstyle: Specifies the arrow style, '->' is standard arrow.
- connectionstyle: Specifies the connection type. arc3 is a straight line.
- color: Specifes color of arror.
- lw: Specifies the line width.

- df_iceland.plot(kind='bar', figsize=(10, 6), rot=90) # rotate the xticks(labelled points on x-axis)


by 90 degrees

# Annotate arrow
plt.annotate("",# s: str. Will leave it blank for no text
xy=(32, 70), # place head of the arrow at point (year 2012 , pop 70)
xytext=(28, 20), # place base of the arrow at point (year 2008 , pop 20)
xycoords='data', # will use the coordinate system of the object being annotated
arrowprops=dict(arrowstyle='->', connectionstyle='arc3', color='blue', lw=2)
)
# annotate value labels to each country

- for index, value in enumerate(df_top15):


- label = format(int(value), ',') # format int with commas

# place text at the end of bar (subtracting 47000 from x, and 0.1 from y to make it fit within the
bar)

- plt.annotate(label, xy=(value - 47000, index - 0.10), color='white')


- plt.show()

Unlike a histogram, a bar chart is commonly used to compare the values of a variable at a given point in
time.
# let's examine the types of the column labels

- all(isinstance(column, str) for column in df_can.columns)

So let's change them all to string type.

- df_can.columns = list(map(str, df_can.columns))


# finally, let's create a list of years from 1980 - 2013
# this will come in handy when we start plotting the data

- years = list(map(str, range(1980, 2014)))


*Option 2: Artist layer (Object oriented method) - using an Axes instance from Matplotlib (preferred)
*You can use an Axes instance of your current plot and store it in a variable (eg. ax). You can add more
elements by calling methods with a little change in syntax (by adding "set_" to the previous methods). For
example, use ax.set_title() instead of plt.title() to add title, or ax.set_xlabel() instead of plt.xlabel() to add
label to the x-axis.
This option sometimes is more transparent and flexible to use for advanced plots (in particular when
having multiple plots, as you will see later).
In this course, we will stick to the scripting layer, except for some advanced visualizations where we will
need to use the artist layer to manipulate advanced aspects of the plots.
# option 2: preferred option with more flexibility

- ax = df_top5.plot(kind='area', alpha=0.35, figsize=(20, 10))


- ax.set_title('Immigration Trend of Top 5 Countries')
- ax.set_ylabel('Number of Immigrants')
- ax.set_xlabel('Years')

You might also like