Unit 2
Unit 2
plt.title('Sample Chart',color='red')
plt.plot(mylabels,y,marker='*',markersize=10,markeredgecolor='yellow')
plt.show()
The lines are unable to efficiently depict comparison between the weeks for which the sales
data is plotted.
In order to show comparisons, we prefer Bar charts. Unlike line plots, bar charts can plot
strings on the x axis
A bar plot or bar chart is a graph that represents the category of data with rectangular bars
with lengths and heights that is proportional to the values which they represent.
The bar plots can be plotted horizontally or vertically. A bar chart describes the comparisons
between the discrete categories.
One of the axis of the plot represents the specific categories being compared, while the other
axis represents the measured values corresponding to those categories.
A bar graph uses bars to compare data among different categories. It is well suited when you
want to measure the changes over a period of time.
Also, the important thing to keep in mind is that longer the bar, greater is the value.
Types of Bar chart are : Simple Bar Chart, Stacked Bar Chart, Group Bar Chart, Horizontal Bar
Chart.
Changing Color
With the help of color properties we can change the color of bar.
Adding Edgecolor
With Edgecolor we can add edge around bar with different color.
LineWidth
With Linewidth we can set the border around bar.
Line Style
With Linestyle we can change the style of border.
Width
With width we can change the width of bar.
import matplotlib.pyplot as plt
import numpy as np
plt.title('Sample Chart',color='red')
plt.bar(mylabels,y,color=[‘red’,’green’],width=0.8)
plt.show()
A pie chart refers to a circular graph which is broken down into segments i.e. slices of pie. It is
basically used to show the percentage or proportional data where each slice of pie represents
a category.
A Pie Chart is a circular statistical plot that can display only one series of data.
The area of the chart is the total percentage of the given data. Pie charts are commonly used in
business presentations like sales, operations, survey results, resources, etc. as they provide a
quick summary.
Syntax :
matplotlib.pyplot.pie(data, explode=None, labels=None, colors=None, autopct=None,
shadow=False)
Labels :
Assigning label value to each part of plot
Explode :
Explode allows you to add space (in terms of the pie radius) around slices.
Shadow :
The function pie() allows you to add shadow to the pie chart. We can set the value True for applying
shadow effects.
Colors :
We can change the colors of different slice of pie chart as per our choice
Autopct :
Use the parameter autopct to show the percentage for each slice
Wedgeprops :
The wedges in the pie chart can be given a border color and border width using the wedgeprops
attribute
Startangle :
By default, pie() starts drawing the slices at 0 degrees. We can change the starting position using this
parameter.
Counterclock :
The function pie() draws slices in a counterclock direction. That’s the default behavior. We can use the
parameter counterclock to flip the direction. If we set this parameter to False, and the slices are plotted
clockwise.
import matplotlib.pyplot as plt
# the slices are ordered and plotted counter-clockwise:
product = 'Product A', 'Product B', 'Product C', 'Product D'
stock = [15, 30, 35, 20]
explode = (0.1, 0, 0.1, 0)
plt.show()
Histograms are column-charts, where each column represents a range of values, and the height
of a column corresponds to how many values are in that range.
To make a histogram, the data is sorted into "bins" and the number of data points in each bin is
counted.
The height of each column in the histogram is then proportional to the number of data points
its bin contains.
Histograms are used to show a distribution whereas a bar chart is used to compare different
entities.
Histograms are useful when you have arrays or a very long list.
Let’s consider an example where I have to plot the age of population with respect to bin. Now,
bin refers to the range of values that are divided into series of intervals. Bins are usually
created of the same size.
Edgecolor
Used to set the edge color.
Linewidth
Used to set the width of the border line.
Linestyle
Used to set the style of the border line.
Fill:
The default True means each hist will be filled with color and False means each hist will
be empty
Hatch :
hatch can be used to fill to each hist with pattern ( '-', '+', 'x', '\\', '*', 'o', 'O', '.')
import matplotlib.pyplot as plt
import numpy as np
x = np.random.normal(170, 10, 250)
plt.hist(x,edgecolor='Green',linewidth=2,linestyle=':', fill=False,hatch='o')
plt.show()
A scatter chart is a two-dimensional data visualization method that uses dots to represent the
values obtained for two different variables —one plotted along the x-axis and the other plotted
along the y-axis.
Scatter plots are used when you want to show the relationship between two variables.
Scatter plots are sometimes called correlation plots because they show how two variables are
correlated.
Additionally, the size, shape or color of the dot could represent a third (or even fourth
variable).
Usually we need scatter plots in order to compare variables, for example, how much
one variable is affected by another variable to build a relation out of it. The data is displayed as
a collection of points, each having the value of one variable which determines the position on
the horizontal axis and the value of other variable determines the position on the vertical axis.
S:
To set the Size of marker.
Color:
To set the color
Linewidth:
To set the width of the line(border)
Marker:
to set the symbol of marker
Edgecolor:
Setting the edge color
import numpy as np
import matplotlib.pyplot as plt
discount= np.array([10,20,30,40,50])
saleInRs=np.array([40000,45000,48000,50000,100000])
size=discount*10
plt.scatter(x=discount,y=saleInRs,s=size,color='red',edgecolor='pink')
plt.title('Sales Vs Discount')
plt.xlabel('Discount offered')
plt.ylabel('Sales in Rs')
plt.show()
Adding title
Chart title is one of the most crucial elements in communicating what your chart is about at first glance. The interpretation
and observation usually starts after reading the title.
plt.title("Progess Grid", color="blue", size=14, loc="left")
Axis Label
It’s quite simple to create axis labels using matplotlib and Python. All you have to do is run xlabel(), ylabel() function to add
an x-axis label & y-axis label to your chart. Here is an example:
plt.xlabel("Day of Month")
plt.ylabel("Task Progress")
With y-axis it might often be necessary to also adjust the rotation of the axis label. This can easily be achieved using
rotation parameter and assigning a rotation angle to it.
plt.ylabel("Task Progress", rotation=90)
Axis Size
Axis size is an automatically adjusted property for Matplotlib charts but you can also assign them specific sizes and make
them fixed in which case axis of a chart won’t change according to data.
You can simply assign lower and upper limits to plot axes using Python code below:
plt.xlim(0,7)
plt.ylim(0,5)
Adding grid lines
We can also add grid lines to our plot to make it more readable. We can achieve this by using the
plt.grid() function. The plt.grid() function takes a boolean value representing whether the grid should
be shown.
Adding Axes ticks
Ticks are another fundamental component in a Python chart. You can adjust the names, frequencies,
colors and even rotation of axis ticks to suit them better.
Axes Ticks can be adjusted using xticks() and yticks() methods in Python. Additionally, we have to
pass the range of tick values for the first parameter and numpy’s arange function is usually very
convenient for this task.
plt.xticks(np.arange(0, 30, 7), ['Wk1', 'Wk2', "Wk3", "Wk4", "Wk5"], rotation=35, color="red")
Adding legend
Legend is another chart element that can enhance a visualization. To activate legend simply execute
Python code below.
plt.legend()
To import an Excel file into Python using Pandas:
import pandas as pd
df = pd.read_excel(r"Path where the Excel file is stored\File name.xlsx")
print(df)
And if you have a specific Excel sheet that you’d like to import, you may then apply:
import pandas as pd
df = pd.read_excel(r"Path of Excel file\File name.xlsx", sheet_name="your Excel sheet name")
print(df)
Example:
import pandas as pd
df = pd.read_excel(r"C:\Users\Ron\Desktop\my_products.xlsx")
print(df)
If that’s the case, you can specify this column name as captured below:
import pandas as pd
data = pd.read_excel(r"C:\Users\Ron\Desktop\my_products.xlsx")
df = pd.DataFrame(data, columns=["product_name", "price"])
print(df)
CSV files are the “comma separated values”, these values are separated by commas, this file
can be viewed as an Excel file.
In Python, Pandas is the most important library coming to data science. We need to deal with
huge datasets while analyzing the data, which usually can be in CSV file format.
To access data from the CSV file, we require a function read_csv() from Pandas that retrieves
data in the form of the data frame.
Example
import pandas as pd
df = pd.read_csv(r"C:\Users\Ron\Desktop\my_products.csv")
#df = pd.DataFrame(data, columns=["product", "price"])
print(df)
Matplotlib is a widely used Python library to plot graphs, plots, charts, etc. show() method is
used to display graphs as output, but don’t save it in any file.
The figure produced after data plotting is saved using the savefig() method, as the name
implies.
Using this technique, the generated figure can be saved to our local computers.