Data Mining and Warehousing Assignment-1: Introduction To Boxplots
Data Mining and Warehousing Assignment-1: Introduction To Boxplots
Introduction to Boxplots
Boxplots are a measure of how well distributed the data in a data set is. It divides the
data set into three quartiles. This graph represents the minimum, maximum, median, first
quartile and third quartile in the data set. It is also useful in comparing the distribution of
data across data sets by drawing boxplots for each of them. You need to have
information on the variability or dispersion of the data. A boxplot is a graph that gives
you a good indication of how the values in the data are spread out. Although boxplots
may seem primitive in comparison to a histogram or density plot, they have the
advantage of taking up less space, which is useful when comparing distributions
between many groups or datasets.
A boxplot is a standardized way of displaying the dataset based on a five-number
summary: the minimum, the maximum, the sample median, and the first and third
quartiles.
boxplot() function takes the data array to be plotted as input in first argument,
second argument patch_artist=True , fills the boxplot and third argument
takes the label to be plotted
Colors array takes up four different colors and passed to four different
boxes of the boxplot with patch.set_facecolor() function.