0% found this document useful (0 votes)
39 views21 pages

Histogram With Plotnine

data wrangling

Uploaded by

lavantrial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views21 pages

Histogram With Plotnine

data wrangling

Uploaded by

lavantrial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Micro Project report on

HISTOGRAM WITH PLOTNINE (GGPLOT)

Submitted to the CMR Institute of Technology in partial fulfilment of the


requirement for the award of the Laboratory of

DATA WRANGLING AND VISUALIZATION LAB


of

II-B.Tech I-Semester in
COMPUTER SCIENCE ENGINEERING (Cyber Security)
Submitted by

T.Dilip 22R01A6258

T.Rakshitha 22R01A6259

V.Bhavana 22R01A6260

V.Spoorthi 22R01A6261

Under the Guidance of

Mr.Y.Ram Kumar

(Assisstant Professor, Dept of CSE(CS))

CMR INSTITUTE OF TECHNOLOGY

(UGC AUTONOMOUS)

(Approved by AICTE, Affiliated to JNTU, Kukatpally, Hyderabad) Kandlakoya,

Medchal road, Hyderabad.

2023-2024
CMR INSTITUTE OF TECHNOLOGY

(UGCAUTONOMOUS)

(Approved by AICTE, Affiliated to JNTU, Kukatpally, Hyderabad) Kandlakoya,Medchal

road, Hyderabad.

Department of Computer Science and Engineering (Cyber Security)

CERTIFICATE

This is to certify that a Micro Project entitled with " HISTOGRAM WITH PLOTNINE
(GGPLOT)" is being

Submitted by

T.Dilip 22R01A6258
T.Rakshitha 22R01A6259
V.Bhavana 22R01A6260
V.Spoorthi 22R01A6261

In partial fulfilment of the requirement for the completion of the Data Wrangling And
Visualization of II-B.Tech I-Semester in CSE(CS) towards a record of a bonafide work carried
out under our guidance and supervision .

Signature of faculty Signature of HOD

Mr.Y.Ram Kumar Mr.A.Prakash

(Assistant Professor) (Head Of Department)


ACKNOWLEDGEMENT
We are extremely grateful to Dr. M. Janga Reddy, Director,Dr. B. Satyanarayana ,Principal
and Mr.A.Prakash,Head Of Department , CMR Institute of Technology for their during
entire duration.

We are extremely thankful to our Data Wrangling And Visualization faculty-in-


charge Mr.Y.Ram Kumar, Assistant Professor, Computer Science and Engineering
Department(CS), CMR Institute of Technology for her constant guidance, encouragement and
moral support throughout the project.

We express our thanks all staff members and friends for all the help and
coordination extended in bringing out this micro project successfully in time.

Finally, we are very much thankful to our parents and relatives who
guided directly or indirectly for successful completion of the project.

T.Dilip 22R01A6258

T.Rakshitha 22R01A6259

V.Bhavana 22R01A6260

V.Spoorthi 22R01A6261
ABSTRACT:

Using plotnine, a Python data visualization library based on ggplot2, you can create
insightful histograms. By leveraging the Grammar of Graphics, plotnine allows you to
design and customize histograms with ease. Specify data, aesthetics, and bin widths
to craft meaningful visual representations of your data distribution. Explore variations
in data intensity, identify trends, and communicate valuable insights through the
powerful and flexible histogram plots generated with plotnine.

1
INTRODUCTION:

A histogram is a graphical representation of the distribution of a dataset, displaying


the frequencies of different values or ranges of values. It consists of bars where the
length corresponds to the frequency of the data with each interval. Histograms
provide insights into the shape, center, and spread of the data, making them a valuable
tool for visualizing patterns and trends.

Plotnine, inspired by the ggplot2 library in R, is a Python data visualization package


that employs the Grammar of Graphics to create expressive and customizable plots.
When it comes to histograms, plotnine excels in providing a seamless way to
visualize the distribution of your data. By defining the data, aesthetics, and bin widths,
you can effortlessly construct histograms that reveal patterns, concentrations, and
variations in your dataset. This introduction sets the stage for utilizing plotnine to
generate insightful and aesthetically pleasing histogram plots for effective data
exploration and communication.

2
OBJECTIVE:

The primary objectives of creating histograms with plotnine (ggplot) include:


➢ Data Exploration: Utilize histograms to visually inspect the distribution of a
dataset, uncovering patterns, central tendencies, and potential outliers.

➢ Pattern Identification: Identify trends and understand the shape of the data
distribution, whether it is symmetric, skewed, or exhibits multiple modes.

➢ Variable Relationships: Explore how different variables relate to each other


by creating histograms for specific subsets or by incorporating additional
aesthetics like color or facets.

➢ Customization: Leverage the flexibility of plotnine to customize the


appearance of histograms, adjusting bin widths, colors, labels, and other visual
elements to enhance clarity and interpretability.

➢ Insight Communication: Effectively communicate insights from your data


distribution, making it accessible to a broader audience through well-crafted and
informative histogram plots.

By achieving these objectives, you can gain a deeper understanding of your data and
communicate key findings more effectively using the plotnine library.

3
CREATING HISTOGRAMS WITH PLOTNINE (GG PLOT)

Histogram

Usage

Only the data and mapping can be positional, the rest must be keyword arguments. **kwargs
can be aesthetics (or parameters) used by the stat.
Parameters used:
mapping : aes, optional
Aesthetic mappings created with aes(). If specified and inherit.aes=True, it is combined
with the default mapping for the plot. You must supply mapping if there is no plot
mapping.

The bold aesthetics are required.


4
data : dataframe, optional
The data to be displayed in this layer. If None, the data from from the ggplot() call is used.
If specified, it overrides the data from the ggplot() call.
stat : str or stat, optional (default: stat_bin)
The statistical transformation to use on the data for this layer. If it is a string, it must be the
registered and known to Plotnine.
position : str or position, optional (default: position_stack)
Position adjustment. If it is a string, it must be registered and known to Plotnine.
na_rm : bool, optional (default: False)
If False, removes missing values with a warning. If True silently removes missing values.
inherit_aes : bool, optional (default: True)
If False, overrides the default aesthetics.
show_legend : bool or dict, optional (default: None)
Whether this layer should be included in the legends. None the default, includes any
aesthetics that are mapped. If a bool, False never includes and True always includes. A dict
can be used to exclude specific aesthetis of the layer from showing in the legend. e.g
show_legend={'color': False}, any other aesthetic are included by default.
raster : bool, optional (default: False)
If True, draw onto this layer a raster (bitmap) object even if the final image is in vector
format.

5
Examples

Histograms
Visualise the distribution of a variable by dividing the x-axis into bins and counting the
number of observations in each bin. Histograms display the counts with bars.
You can define the number of bins (e.g. divide the data five bins) or define the binwidth (e.g.
each bin is size 10).
Distributions can be visualised as: * count, * normalised count, * density, * normalised
density, * scaled density as a percentage.

If you create a basic histogram, you will be prompted to define the binwidth or number of
bins.

6
You can define the width of bins, by specifying the binwidth inside geom_histogarm( ).

7
Or you can define the number of bins by specifying bins inside geom_histogram( ). Note,
the example below uses 10 bins, however you can't see them all because some of the bins
are too small to be noticeable.

There are different ways to visualise the distribution, you can specify this using the y
argument within aes( ). In the example below I'm using the default setting: raw count
with after_stat('count').

8
You can normalise the raw count to 1 by using after_stat('ncount'):

You can display the density of points in a bin, (this is scaled to integrate to 1) by using
after_stat('density'):

9
The proportion of bins can be shown, in the example below the bin=0.5 accounts for
about ~55% of the data:

We can also display counts as percentages by using the percent_format() which requires
the mizani.formatters library

10
Instead of using stat you can use stat_bin defined within geom_histogram() , this is
useful if you want to layer a few different plots in the one figure.

You can also flip the x-y coordinates:

11
You can visualise counts by other variables using fill within aes():

You can visualise too-small-to-see bars by transforming the y-axis scaling by using
scale_y_sqrt() square-root scale or scale_y_log10() for a log-scale (similarly use
scale_x_sqrt() and scale_x_log10() to transform the x-axis).

12
Change the look of your plot:

13
Another change, this time changing the fill colours manually:

14
When faceting histograms with scaled counts/densities, they are normalised by each
facet, and not overall. Here's an example of a facet wrap:

15
Here's an example of a facet grid with the count normalised in each grid:

16
CONCLUSION:

The conclusions drawn from creating a histogram with plotnine (ggplot) depend on the
characteristics of the data and the features highlighted in the visualization. Here are
some general points you might consider:
Distribution Shape:
Assess whether the distribution is symmetric, skewed (left or right), unimodal, or
multimodal. This provides insights into the central tendency of the data.
Central Tendency:
Examine the central tendency of the data by identifying the peak or center of the
histogram. This could be represented by the mean, median, or mode.
Variability:
Evaluate the spread or variability of the data. A wider spread suggests higher variability,
while a narrower spread indicates lower variability.
Outliers:
Identify any outliers that may appear as data points far from the main concentration.
Outliers can impact the interpretation of the distribution.
Patterns and Trends:
Look for any patterns or trends within the histogram that may reveal underlying
structures in the data.
Interpretation:
Provide a meaningful interpretation of the histogram in the context of your analysis or
research question. What do the distribution characteristics suggest about the nature of
the variable.
Considerations for Further Analysis:
Based on the insights gained from the histogram, consider whether further statistical
analysis or exploration is warranted.

17
18

You might also like