0% found this document useful (0 votes)
14 views6 pages

42 Histograms2

A histogram is a graphical representation of data distribution using bars to show frequency across defined ranges (bins) on the X-axis. The document explains how to create histograms using Python's Matplotlib library, detailing various parameters for customization such as bins, density, weights, and edge color. Several examples illustrate the implementation of histograms with different configurations and visual styles.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views6 pages

42 Histograms2

A histogram is a graphical representation of data distribution using bars to show frequency across defined ranges (bins) on the X-axis. The document explains how to create histograms using Python's Matplotlib library, detailing various parameters for customization such as bins, density, weights, and edge color. Several examples illustrate the implementation of histograms with different configurations and visual styles.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Histograms

(Visual representation of data distribution)

A histogram is basically used to represent data provided in a


form of some groups.It is accurate method for the graphical
representation of numerical data distribution.It is a type of bar
plot where X-axis represents the bin ranges while Y-axis gives
information about frequency.

Histograms are column-charts, where each column represents


a range of values, and the height of a column corresponds to
how many values are in that range. To make a histogram, the
data is sorted into "bins" and the number of data points in each
bin is counted. The height of each column in the histogram
is then proportional to the number of data points its bin
contains. The df.plot(kind=’hist’) function automatically selects
the size of the bins based on the spread of values in
the data.

import pandas as pd
import matplotlib.pyplot as plt
data = {'Name':['Arnav', 'Sheela', 'Azhar', 'Bincy', 'Yash',
'Nazar'],
'Height' : [60,61,63,65,61,60],
'Weight' : [47,89,52,58,50,47]}
}

df=pd.DataFrame(data)

df.plot(kind='hist')

plt.show()
Creating a Histogram
To create a histogram the first step is to create bin of the ranges, then
distribute the whole range of the values into a series of intervals, and
count the values which fall into each of the intervals.Bins are clearly
identified as consecutive, non-overlapping intervals of variables.The
matplotlib.pyplot.hist() function is used to compute and create histogram
of x.
The following table shows the parameters accepted by
matplotlib.pyplot.hist() function :In Matplotlib, we use the hist() function to
create histograms.The hist() function will use an array of numbers to create a
histogram, the array is sent into the function as an argument.

Attribute parameter
x array or sequence of array
bins optional parameter contains integer or sequence or strings
density optional parameter contains boolean values
range optional parameter represents upper and lower range of bins
optional parameter used to create type of histogram [bar, barstacked, step,
histtype
stepfilled], default is “bar”
align optional parameter controls the plotting of histogram [left, right, mid]
weights optional parameter contains array of weights having same dimensions as x
bottom location of the basline of each bin
rwidth optional parameter which is relative width of the bars with respect to bin width
color optional parameter used to set color or sequence of color specs
label optional parameter string or sequence of string to match with multiple datasets
log optional parameter used to set histogram axis on log scale

Example1:
x array or sequence of array

from matplotlib import pyplot as plt


import numpy as np

# Creating dataset
a = np.array([22, 87, 5, 43, 56,73, 55, 54, 11,20, 51, 5, 79, 31,27])
# Creating histogram

plt.hist(a,

# Show plot
plt.xlabel('age')
plt.ylabel('count')
plt.title('Histogram Example')
plt.show()
Example2:

bins optional parameter contains integer or sequence or strings

color optional parameter used to set color or sequence of color specs

from matplotlib import pyplot as plt


import numpy as np

# Creating dataset
a = np.array([22, 87, 5, 43, 56,73, 55, 54, 11,20, 51, 5, 79, 31,27])
# Creating histogram

# bins{int,sequence.string)
plt.hist(a,bins=5,ec=’red’) #binn=int , ec=edge colour
plt.hist(a,bins=[0,25,50,75,100],ec=’red’) #binn=sequence , ec=edgecolour
plt.hist(a,bins=[0,25,50,75,100],ec=’red’) #binn=string , ec=edgecolour
# Show plot
plt.xlabel('age')
plt.ylabel('count')
plt.title('Histogram Example')
plt.show()

Example3:

range optional parameter represents upper and lower range of bins

range=(max value- minvalue)/no of bins


range=(85-5)/5=16.5
from matplotlib import pyplot as plt
import numpy as np

# Creating dataset
a = np.array([22, 87, 5, 43, 56,73, 55, 54, 11,20, 51, 5, 79, 31,27])
# Creating histogram

# bins{int,sequence.string)
plt.hist(a,bins=5,ec=’red’) #binn=int , ec=edge colour

# Show plot
plt.xlabel('age')
plt.ylabel('count')
plt.title('Histogram Example')
plt.show()

ex:
plt.hist(a,3,(5,90),ec='red')
Example4:

density optional parameter contains boolean values

density =’True’

Density= Total count/(input cout)*(width) === 4/(15)*16.4==0.01626


width=(max value-min value)/total bins === (87-5)/5 = 16.4

from matplotlib import pyplot as plt


import numpy as np

# Creating dataset
a = np.array([22, 87, 5, 43, 56,73, 55, 54, 11,20, 51, 5, 79, 31,27])
# Creating histogram

# bins{int,sequence.string)
plt.hist(a,bins=5,ec=’red’, density=’True’) #binn=int , ec=edge colour

# Show plot
plt.xlabel('age')
plt.ylabel('count')
plt.title('Histogram Example')

plt.show()

Example5:

weights optional parameter contains array of weights having same dimensions as x

from matplotlib import pyplot as plt

import numpy as np
# Creating dataset

a = np.array([1,12,22,21,20,21])
# Creating histogram

# bins{int,sequence.string)
plt.hist(a,bins=5,ec='red',weights=[2,2,2,2,3,3])
# Show plot
plt.xlabel('age')

plt.ylabel('count')
plt.title('Histogram Example')

plt.show()

Example:6

Cumulative=’True’

bincount+smaller values

Cumulative=’-1’

bincount+Greater values

from matplotlib import pyplot as plt


import numpy as np

# Creating dataset
a = np.array([1,12,22,21,20,21])

# Creating histogram

plt.hist(a,bins=5,ec='red',cumulative='True')

#plt.hist(a,bins=5,ec='red',cumulative='-1')

# Show plot
plt.xlabel('age')
plt.ylabel('count')
plt.title('Histogram Example')
plt.show()

Customising Histogram:

Taking the same data as above, now let see how the histogram can be customised. Let
us change the edgecolor, which is the border of each hist, to green.Also, let us change
the line style to ":" and line width to 2. Let us try another property called fill, which
takes boolean values. The default True means each hist will be filled with color and
False means each hist will be empty. Another property called hatch can be used to fill
to each hist with pattern ( '-', '+', 'x', '\\', '*', 'o', 'O', '.'). In the Program 4-10, we have
used the hatch value as "o".

mport pandas as pd
import matplotlib.pyplot as plt

data = {'Name':['Arnav', 'Sheela', 'Azhar','Bincy','Yash','Nazar'],


'Height' : [60,61,63,65,61,60],
'Weight' : [47,89,52,58,50,47]}

df=pd.DataFrame(data)

df.plot(kind='hist',edgecolor='Green',linewidth=2,linestyle=':',fill=False,hatch='o')
plt.show()

You might also like