
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Compute Histogram of Data Using Numpy in Python
A Histogram is the graphical representation of the dataset distribution. It represents the data in the form of series of bars, where the range of data values represented by each bar and height of the bar represents the frequency of the data values defined within the range.
These are mainly used to represent the distribution of the numerical data like grades in a class, distribution of the population or distribution of the incomes of the employees etc.
In histogram, x-axis represents the range of data values, divided into intervals and the y-axis represents the frequency of the range of data values within each bin. Histograms can be normalized by dividing the frequency of each bin by the total data values, which results to the relative frequency histogram where y-axis represents the data values of each bin.
Calculating histogram using Python Numpy
In python, for creating the histograms we have numpy, matplotlib and seaborn libraries. In Numpy, we have the function named histogram() to work with the histogram data.
Syntax
Following is the syntax for creating the histograms for the given range of data.
numpy.histogram(arr, bins, range, normed, weights, density)
Where,
arr is the input array
bins is the number of bars to be in the graph to represent the data
range defines the range of values to be in the histogram
normed is in favor of the density parameter
weights is the optional parameter which weights for each data value
Density is the parameter to normalize the histogram data to form probability density.
The output of the histogram function will be a tuple containing the histogram counts and bin edges.
Example
In the following example, we are creating a histogram using the Numpy histogram() function. Here, we are passing an array as the input parameter, define bins as 10 so the histogram will be created with 10 bins and the remaining parameters can be kept as none.
import numpy as np arr = np.array([10,20,25,40,35,23]) hist = np.histogram(arr,bins = 10) print("The histogram created:",hist)
Output
The histogram created: (array([1, 0, 0, 1, 1, 1, 0, 0, 1, 1], dtype=int64), array([10., 13., 16., 19., 22., 25., 28., 31., 34., 37., 40.]))
Example
Let's see another example to understand the histogram() function of the numpy library.
import numpy as np arr = np.array([[20,20,25],[40,35,23],[34,22,1]]) hist = np.histogram(arr,bins = 20) print("The histogram created:",hist)
Output
The histogram created: (array([1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1], dtype=int64), array([ 1. , 2.95, 4.9 , 6.85, 8.8 , 10.75, 12.7 , 14.65, 16.6 , 18.55, 20.5 , 22.45, 24.4 , 26.35, 28.3 , 30.25, 32.2 , 34.15, 36.1 , 38.05, 40. ]))
Example
In this example, we are creating a histogram by specifying the bins and also the range of data to be used. The following code can be taken as a reference.
import numpy as np arr = np.array([[20,20,25],[40,35,23],[34,22,1]]) hist = np.histogram(arr,bins = 20, range = (1,10)) print("The histogram created:", hist)
Output
The histogram created: (array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0], dtype=int64), array([ 1. , 1.45, 1.9 , 2.35, 2.8 , 3.25, 3.7 ,4.15, 4.6 , 5.05, 5.5 , 5.95, 6.4 , 6.85, 7.3 , 7.75, 8.2 , 8.65, 9.1 , 9.55, 10. ]))