Histogram With Plotnine
Histogram With Plotnine
II-B.Tech I-Semester in
COMPUTER SCIENCE ENGINEERING (Cyber Security)
Submitted by
T.Dilip 22R01A6258
T.Rakshitha 22R01A6259
V.Bhavana 22R01A6260
V.Spoorthi 22R01A6261
Mr.Y.Ram Kumar
(UGC AUTONOMOUS)
2023-2024
CMR INSTITUTE OF TECHNOLOGY
(UGCAUTONOMOUS)
road, Hyderabad.
CERTIFICATE
This is to certify that a Micro Project entitled with " HISTOGRAM WITH PLOTNINE
(GGPLOT)" is being
Submitted by
T.Dilip 22R01A6258
T.Rakshitha 22R01A6259
V.Bhavana 22R01A6260
V.Spoorthi 22R01A6261
In partial fulfilment of the requirement for the completion of the Data Wrangling And
Visualization of II-B.Tech I-Semester in CSE(CS) towards a record of a bonafide work carried
out under our guidance and supervision .
We express our thanks all staff members and friends for all the help and
coordination extended in bringing out this micro project successfully in time.
Finally, we are very much thankful to our parents and relatives who
guided directly or indirectly for successful completion of the project.
T.Dilip 22R01A6258
T.Rakshitha 22R01A6259
V.Bhavana 22R01A6260
V.Spoorthi 22R01A6261
ABSTRACT:
Using plotnine, a Python data visualization library based on ggplot2, you can create
insightful histograms. By leveraging the Grammar of Graphics, plotnine allows you to
design and customize histograms with ease. Specify data, aesthetics, and bin widths
to craft meaningful visual representations of your data distribution. Explore variations
in data intensity, identify trends, and communicate valuable insights through the
powerful and flexible histogram plots generated with plotnine.
1
INTRODUCTION:
2
OBJECTIVE:
➢ Pattern Identification: Identify trends and understand the shape of the data
distribution, whether it is symmetric, skewed, or exhibits multiple modes.
By achieving these objectives, you can gain a deeper understanding of your data and
communicate key findings more effectively using the plotnine library.
3
CREATING HISTOGRAMS WITH PLOTNINE (GG PLOT)
Histogram
Usage
Only the data and mapping can be positional, the rest must be keyword arguments. **kwargs
can be aesthetics (or parameters) used by the stat.
Parameters used:
mapping : aes, optional
Aesthetic mappings created with aes(). If specified and inherit.aes=True, it is combined
with the default mapping for the plot. You must supply mapping if there is no plot
mapping.
5
Examples
Histograms
Visualise the distribution of a variable by dividing the x-axis into bins and counting the
number of observations in each bin. Histograms display the counts with bars.
You can define the number of bins (e.g. divide the data five bins) or define the binwidth (e.g.
each bin is size 10).
Distributions can be visualised as: * count, * normalised count, * density, * normalised
density, * scaled density as a percentage.
If you create a basic histogram, you will be prompted to define the binwidth or number of
bins.
6
You can define the width of bins, by specifying the binwidth inside geom_histogarm( ).
7
Or you can define the number of bins by specifying bins inside geom_histogram( ). Note,
the example below uses 10 bins, however you can't see them all because some of the bins
are too small to be noticeable.
There are different ways to visualise the distribution, you can specify this using the y
argument within aes( ). In the example below I'm using the default setting: raw count
with after_stat('count').
8
You can normalise the raw count to 1 by using after_stat('ncount'):
You can display the density of points in a bin, (this is scaled to integrate to 1) by using
after_stat('density'):
9
The proportion of bins can be shown, in the example below the bin=0.5 accounts for
about ~55% of the data:
We can also display counts as percentages by using the percent_format() which requires
the mizani.formatters library
10
Instead of using stat you can use stat_bin defined within geom_histogram() , this is
useful if you want to layer a few different plots in the one figure.
11
You can visualise counts by other variables using fill within aes():
You can visualise too-small-to-see bars by transforming the y-axis scaling by using
scale_y_sqrt() square-root scale or scale_y_log10() for a log-scale (similarly use
scale_x_sqrt() and scale_x_log10() to transform the x-axis).
12
Change the look of your plot:
13
Another change, this time changing the fill colours manually:
14
When faceting histograms with scaled counts/densities, they are normalised by each
facet, and not overall. Here's an example of a facet wrap:
15
Here's an example of a facet grid with the count normalised in each grid:
16
CONCLUSION:
The conclusions drawn from creating a histogram with plotnine (ggplot) depend on the
characteristics of the data and the features highlighted in the visualization. Here are
some general points you might consider:
Distribution Shape:
Assess whether the distribution is symmetric, skewed (left or right), unimodal, or
multimodal. This provides insights into the central tendency of the data.
Central Tendency:
Examine the central tendency of the data by identifying the peak or center of the
histogram. This could be represented by the mean, median, or mode.
Variability:
Evaluate the spread or variability of the data. A wider spread suggests higher variability,
while a narrower spread indicates lower variability.
Outliers:
Identify any outliers that may appear as data points far from the main concentration.
Outliers can impact the interpretation of the distribution.
Patterns and Trends:
Look for any patterns or trends within the histogram that may reveal underlying
structures in the data.
Interpretation:
Provide a meaningful interpretation of the histogram in the context of your analysis or
research question. What do the distribution characteristics suggest about the nature of
the variable.
Considerations for Further Analysis:
Based on the insights gained from the histogram, consider whether further statistical
analysis or exploration is warranted.
17
18