DATA VISUALIZATION
Generating Data
DATA VISUALIZATION
Python is used for data-intensive work in
genetics, climate research, sports, political and
economic analysis.
Mathematical Plotting Library is a popular tool
used to make simple plots such as line graphs
and scatter plots.
Plotly package creates visualizations that work
well on digital devices.
Matplotlib is installed using the command
$ python –m pip install –user matplotlib
MATPLOTLIB
What is Matplotlib?
Matplotlib is a comprehensive library for creating
static, animated, and interactive visualizations in
Python.
Matplotlib is a low level graph plotting library in
python that serves as a visualization utility.
Matplotlib was created by John D. Hunter.
Matplotlib is open source and we can use it
freely.
Matplotlib is mostly written in python, a few
segments are written in C, Objective-C and
Javascript for Platform compatibility.
PLOTTING A SIMPLE LINE GRAPH
import matplotlib.pyplot as plt
squares =[1,4,9,16,25]
fig, ax = plt.subplots()
ax.plot(squares)
plt.show()
PLOTTING A SIMPLE LINE GRAPH
Pyplot is a collection of functions that make
matplotlib work like MATLAB.
Plt is used so that we don’t type matplotlib.pyplot
repeatedly
The matplotlib.pyplot.subplots method provides a
way to plot multiple plots on a single figure.
Fig – indicates the entire figure or collection of plots
ax -> represents a single plot
Plot() function is used to plot the data in a meaningful
way
The function plt.show() opens MATplotlib’s viewer
and displays the plot.
CORRECTING THE LINE PLOT (W.R.T X- AXIS)
import matplotlib.pyplot as plt
squares = [1,4,9,16,25]
input_values = [1,2,3,4,5]
fig,ax = plt.subplots()
ax.plot(input_values, squares, linewidth = 3)
plt.show()
CHANGING THE LABEL TYPE AND LINE
THICKNESS
import matplotlib.pyplot as plt
squares =[1,4,9,16,25]
input=[1,2,3,4,5]
fig,ax = plt.subplots()
ax.plot(input, squares, linewidth=3)
# Set chart title and label axes
ax.set_title("Square Numbers", fontsize=24)
ax.set_xlabel("Value", fontsize = 14)
ax.set_ylabel("Square of Value", fontsize = 14)
#set size of tick labels
ax.tick_params(axis = 'both', labelsize = 14)
plt.show()
EXPLANATION OF PROGRAM
The linewidth parameter controls the thickness
of the line that generates the plot.
What is the meaning of Tick_params?
tick_params() is used to change the
appearance of ticks, tick labels, and
gridlines.
The method tick_params() styles the tick marks.
Both x-axis and y-axis are set to labelsize of 14.
SCATTER PLOT
A scatter plot is a diagram where each value in the
data set is represented by a dot.
The Matplotlib module has a method for drawing
scatter plots, it needs two arrays of the same length,
one for the values of the x-axis, and one for the values
of the y-axis
# Basic scatter plotimport matplotlib.pyplot as plt
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]
fig, ax = plt.subplots()
ax.scatter(x,y)
plt.show()
SCATTER PLOT WITH BUILT-IN SEABORN STYLE AND PLOTTING A
SERIES OF POINTS WITH SCATTER
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y= [1, 4, 9, 16, 25]
plt.style.use('seaborn-v0_8')
fig, ax = plt.subplots()
ax.scatter(x,y,s=100)
plt.show()
CALCULATING THE DATA AUTOMATICALLY
import matplotlib.pyplot as plt
x_values = range(1,1001)
y_values = [x**2 for x in x_values]
plt.style.use('seaborn-v0_8')
fig, ax = plt.subplots()
#ax.scatter(x_values,y_values,s=10)
ax.scatter(x_values,y_values,c=y_values, cmap=plt.cm.Reds, s=10)
ax.axis ([0,1100,0,1100000])
Removes the
extra white
plt.show() spaces around
plt.savefig('squares_plot.png', bbox_inches='tight') the plot
RANDOM WALKS
Using RANDOM module, python will generate a series of
random decisions each of which is left entirely to change.
You can image a random walk as the path a confused ant
would take if it took every step in a random direction.
Random walks have practical applications in nature,
physics, biology, chemistry, and economics.
Creating the RandomWalk() Class:
To create a random walk, we’ll create a RandomWalk
class, which will make random decisions about which
direction the walk should take.
The class needs three attributes: one variable to store the
number of points in the walk and two lists to store the x-
and y-coordinate values of each point in the walk.
We’ll only need two methods for the RandomWalk class:
the __init__() method and fill_walk(), which will calculate
the points in the walk.
RandomWalk CLASS and fill_walk method:
from random import choice
class RandomWalk:
Random_walk.py
def __init__(self,num_points=7):
self.num_points=num_points
self.x_values=[0] # All walks start at (0,0)
self.y_values=[0]
def fill_walk(self):
: # keep taking Steps until the walk reaches desired length
while len(self.x_values)<self.num_points
#decide which direction to go and how far to go in that direction.
x_direction= choice([1,-1])
x_distance=choice([0,1])
x_step=x_direction * x_distance rw_visual.py
import matplotlib.pyplot as plt
y_direction= choice([1,-1])
y_distance=choice([0,1]) from random_walk import
y_step=y_direction * y_distance Randomwalk
if x_step == 0 and y_step == 0: rw =Randomwalk()
continue rw.fill_walk()
x=self.x_values[-1]+x_step plt.style.use('classic')
y=self.y_values[-1]+y_step fig, ax = plt.subplots()
ax.scatter(rw.x_values, rw.y_values,
self.x_values.append(x) s = 15)
self.y_values.append(y) plt.show()
we start each walk at the point (0, 0).
The main part of the fill_walk() method tells Python how to
simulate four random decisions: will the walk go right or
left? How far will it go in that direction? Will it go up or
down? How far will it go in that direction? .
We use choice([1, -1]) to choose a value for x_direction,
which returns either 1 for right movement or −1 for left .
Next, choice([0, 1, 2, 3, 4]) tells Python how far to move in
that direction (x_distance) by randomly selecting an integer
between 0 and 4.
A positive result for x_step means move right, a negative
result means move left, and 0 means move vertically.
A positive result for y_step means move up, negative
means move down, and 0 means move horizontally.
If the value of both x_step and y_step are 0, the walk
doesn’t go anywhere, so we continue the loop to ignore this
move .
GRAPH:
Multiple random walks:
One way to use the preceding code to make multiple walks without
having to run the program several times is to wrap it in a while loop,
like this:
import matplotlib.pyplot as plt
from random_walk import Randomwalk
while True:
rw =Randomwalk()
rw.fill_walk()
plt.style.use('classic')
fig, ax = plt.subplots()
ax.scatter(rw.x_values, rw.y_values, s = 15)
plt.show()
keep_running = input("Make another walk? (y/n):")
if keep_running == 'n':
break
ADDING COLOR TO THE PLOT
import matplotlib.pyplot as plt
from random_walk import RandomWalk
rw =RandomWalk()
rw.fill_walk()
plt.style.use('classic')
fig, ax = plt.subplots()
ax.scatter(rw.x_values, rw.y_values, c=range(rw.num_points)
cmap =plt.cm.Blues, edgecolors = 'none', s = 15)
plt.show()
we use range() to generate a list of numbers equal to the number of
points in the walk.
the c argument, use the Reds colormap, and then pass
edgecolors='none' to get rid of the black outline around each point.
The result is a plot of the walk that varies from light to dark blue
along a gradient
MODIFICATIONS IN THE RANDOMWALK PROGRAM
# Emphasize first and last points
ax.scatter(0,0,c='green',s=1500)
ax.scatter(rw.x_values[-1], rw.y_values[-1], c="red" ,s =
1500)
# To remove the axis lines
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# Altering the size to fill the screen
fig, ax = plt.subplots(figsize=(20,15), dpi=128)
Screen resolution is 100 pixels per inch.
MODIFICATIONS IN THE RANDOMWALK PROGRAM
# Increase the random set of data to 50,000
rw =Randomwalk(50_000)
# Plotting the Starting and Ending Points:
ax.scatter(rw.x_values,rw.y_values,c='lightpink',
edgecolor='none',s=10)
ax.scatter(0,0,c='green',edgecolor='none',s=1500)
ax.scatter(rw.x_values[-1], rw.y_values[-1],
c="red",edgecolor='none',s = 1500)
ROLLING DICE WITH PLOTLY:
Python package plotly is used to produce
interactive visualizations.
When user hovers over certain elements on the
screen, information about that element is
highlighted.
Study of rolling dice is used in real world
applications in casinos and other gambling
scenarios as well as in games Monopoly and
many role-playing games.
ROLLING DICE WITH PLOTLY:
Installing Plotly
Install Plotly using pip, just as you did for Matplotlib:
$ python -m pip install --user plotly.
The init () method takes one optional argument.
With the Die class, when an instance of our die is created,
the number of sides will always be six if no argument is
included.
If an argument is included, that value will set the number
of sides on the die.
The roll() method uses the randint() function to return a
random number between 1 and the number of sides.
ROLLING DICE WITH PLOTLY:
Creating the die class:
from random import randint
class die:
""" A class representing a single die D6"""
def __init__(self,num_sides=6):
self.num_sides=num_sides
def roll(self):
return randint(1,self.num_sides) OUTPUT:
Rolling the die:
[4, 3, 1, 4, 4, 1, 5, 6, 6, 4, ]
from die import die
die=die()
results=[]
for roll_num in range(10):
result=die.roll()
results.append(result)
print(results)
ROLLING THE DICE:
import random
print("Rolling the dices...")
print("The values are....")
while True:
value =random.randint(1, 6)
print(f"The number is: {value}")
roll_again = input("Roll the dices again? (y/n)")
if(roll_again =='n'): Output:
break Rolling the dices...
The values are....
Roll the dices again? (y/n)y
The number is: 5
Roll the dices again? (y/n)y
The number is: 1
Roll the dices again? (y/n)y
The number is: 5
Roll the dices again? (y/n)y
The number is: 2
Roll the dices again? (y/n)n
ROLLING DICE WITH PLOTLY:
Analyzing the Results:
We’ll analyze the results of rolling one D6 by counting how many
times we roll each number.
The number 1 is repeated: 13
for roll_num in range(100): The number 2 is repeated: 23
dice_num= die.roll() The number 3 is repeated: 17
results.append(dice_num) The number 4 is repeated: 16
The number 5 is repeated: 17
# Analyze the results The number 6 is repeated: 14
frequencies =[ ]
for value in range (1 ,die.num_sides + 1):
repetition = results.count(value)
frequencies.append(repetition)
for value in range (1 ,die.num_sides +1):
print(f"The number {value} is repeated: {frequencies[value-
1]}")
ANALYZING THE RESULTS:
we create an instance of Die with the default six sides. At we roll
the die 100 times and store the results of each roll in the list
results.
To analyze the rolls, we create the empty list frequencies to store
the number of times each value is rolled.
count how many times each number appears in results and then
append this value to the frequencies list.
Making a Histogram:
A histogram is a bar chart showing how often certain results
occur. Here’s the code to create the histogram.
To make a histogram, we need a bar for each of the possible
results.
We store these in a list called x_values, which starts at 1 and
ends at the number of sides on the die
The Layout() class returns an object that
specifies the layout and configuration of the
graph as a whole .
Here we set the title of the graph and pass the x
and yaxis configuration dictionar ies as well.
To generate the plot, we call the offline.plot()
function .
This function needs a dictionary containing the
data and layout objects, and it also accepts a
name for the file where the graph will be saved.
We store the out put in a file called d6.html.
MAKING A HISTOGRAM:
from plotly.graph_objs import Bar, Layout
from plotly import offline
# bar graph using plotly
x_values = list(range(1,die.num_sides + 1))
data = [Bar(x=x_values, y=frequencies)]
x_axis_title={'title':'Result'}
y_axis_title={'title':'Frequency of Result'}
my_layout = Layout(title='Histogram of Dice rolling
100 times', xaxis = x_axis_title, yaxis = y_axis_title)
offline.plot({'data':data, 'layout' : my_layout})
ROLLING TWO DICE
from random import randint
from plotly.graph_objs import Bar, Layout
from plotly import offline
class Die:
def __init__(self,num_sides=6):
self.num_sides = num_sides
def roll(self):
return randint(1,self.num_sides)
def main():
die1 = Die()
die2 = Die()
results =[]
for roll_num in range(1000):
result = die1.roll() + die2.roll()
results.append(result)
ROLLING TWO DICE
frequencies =[]
max_result = die1.num_sides + die2.num_sides
for value in range(2,max_result + 1):
frequency = results.count(value)
frequencies.append(frequency)
# bar graph using plotly
x_values = list(range(2,max_result + 1))
data = [Bar(x=x_values, y=frequencies)]
x_axis_title={'title':'Face of Dice','dtick':1}
y_axis_title={'title':'Frequency of Dice face occurance '}
my_layout = Layout(title='Histogram of Dice rolling 1000 times', xaxis =
x_axis_title, yaxis = y_axis_title)
offline.plot({'data':data, 'layout' : my_layout})
if __name__=="__main__":
main()
ROLLING DICE OF DIFFERENT SIZES
die2 = Die(10)
SUMMARY
Visualization of Data – Simple Line Plots using
matplotlib.
Scatter Plots to explore random walks.
Histogram using Plotly
Histogram to explore the results of rolling dice of
different sizes.