0% found this document useful (0 votes)

39 views21 pages

Histogram With Plotnine

data wrangling

Uploaded by

lavantrial

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views21 pages

Histogram With Plotnine

data wrangling

Uploaded by

lavantrial

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Micro Project report on

HISTOGRAM WITH PLOTNINE (GGPLOT)

Submitted to the CMR Institute of Technology in partial fulfilment of the

requirement for the award of the Laboratory of

DATA WRANGLING AND VISUALIZATION LAB

II-B.Tech I-Semester in
COMPUTER SCIENCE ENGINEERING (Cyber Security)
Submitted by

T.Dilip 22R01A6258

T.Rakshitha 22R01A6259

V.Bhavana 22R01A6260

V.Spoorthi 22R01A6261

Under the Guidance of

Mr.Y.Ram Kumar

(Assisstant Professor, Dept of CSE(CS))

CMR INSTITUTE OF TECHNOLOGY

(UGC AUTONOMOUS)

(Approved by AICTE, Affiliated to JNTU, Kukatpally, Hyderabad) Kandlakoya,

Medchal road, Hyderabad.

2023-2024
CMR INSTITUTE OF TECHNOLOGY

(UGCAUTONOMOUS)

(Approved by AICTE, Affiliated to JNTU, Kukatpally, Hyderabad) Kandlakoya,Medchal

road, Hyderabad.

Department of Computer Science and Engineering (Cyber Security)

CERTIFICATE

This is to certify that a Micro Project entitled with " HISTOGRAM WITH PLOTNINE
(GGPLOT)" is being

Submitted by

T.Dilip 22R01A6258
T.Rakshitha 22R01A6259
V.Bhavana 22R01A6260
V.Spoorthi 22R01A6261

In partial fulfilment of the requirement for the completion of the Data Wrangling And
Visualization of II-B.Tech I-Semester in CSE(CS) towards a record of a bonafide work carried
out under our guidance and supervision .

Signature of faculty Signature of HOD

Mr.Y.Ram Kumar Mr.A.Prakash

(Assistant Professor) (Head Of Department)

ACKNOWLEDGEMENT
We are extremely grateful to Dr. M. Janga Reddy, Director,Dr. B. Satyanarayana ,Principal
and Mr.A.Prakash,Head Of Department , CMR Institute of Technology for their during
entire duration.

We are extremely thankful to our Data Wrangling And Visualization faculty-in-

charge Mr.Y.Ram Kumar, Assistant Professor, Computer Science and Engineering
Department(CS), CMR Institute of Technology for her constant guidance, encouragement and
moral support throughout the project.

We express our thanks all staff members and friends for all the help and
coordination extended in bringing out this micro project successfully in time.

Finally, we are very much thankful to our parents and relatives who
guided directly or indirectly for successful completion of the project.

T.Dilip 22R01A6258

T.Rakshitha 22R01A6259

V.Bhavana 22R01A6260

V.Spoorthi 22R01A6261
ABSTRACT:

Using plotnine, a Python data visualization library based on ggplot2, you can create
insightful histograms. By leveraging the Grammar of Graphics, plotnine allows you to
design and customize histograms with ease. Specify data, aesthetics, and bin widths
to craft meaningful visual representations of your data distribution. Explore variations
in data intensity, identify trends, and communicate valuable insights through the
powerful and flexible histogram plots generated with plotnine.

1
INTRODUCTION:

A histogram is a graphical representation of the distribution of a dataset, displaying

the frequencies of different values or ranges of values. It consists of bars where the
length corresponds to the frequency of the data with each interval. Histograms
provide insights into the shape, center, and spread of the data, making them a valuable
tool for visualizing patterns and trends.

Plotnine, inspired by the ggplot2 library in R, is a Python data visualization package

that employs the Grammar of Graphics to create expressive and customizable plots.
When it comes to histograms, plotnine excels in providing a seamless way to
visualize the distribution of your data. By defining the data, aesthetics, and bin widths,
you can effortlessly construct histograms that reveal patterns, concentrations, and
variations in your dataset. This introduction sets the stage for utilizing plotnine to
generate insightful and aesthetically pleasing histogram plots for effective data
exploration and communication.

2
OBJECTIVE:

The primary objectives of creating histograms with plotnine (ggplot) include:

➢ Data Exploration: Utilize histograms to visually inspect the distribution of a
dataset, uncovering patterns, central tendencies, and potential outliers.

➢ Pattern Identification: Identify trends and understand the shape of the data
distribution, whether it is symmetric, skewed, or exhibits multiple modes.

➢ Variable Relationships: Explore how different variables relate to each other

by creating histograms for specific subsets or by incorporating additional
aesthetics like color or facets.

➢ Customization: Leverage the flexibility of plotnine to customize the

appearance of histograms, adjusting bin widths, colors, labels, and other visual
elements to enhance clarity and interpretability.

➢ Insight Communication: Effectively communicate insights from your data

distribution, making it accessible to a broader audience through well-crafted and
informative histogram plots.

By achieving these objectives, you can gain a deeper understanding of your data and
communicate key findings more effectively using the plotnine library.

3
CREATING HISTOGRAMS WITH PLOTNINE (GG PLOT)

Histogram

Usage

Only the data and mapping can be positional, the rest must be keyword arguments. **kwargs
can be aesthetics (or parameters) used by the stat.
Parameters used:
mapping : aes, optional
Aesthetic mappings created with aes(). If specified and inherit.aes=True, it is combined
with the default mapping for the plot. You must supply mapping if there is no plot
mapping.

The bold aesthetics are required.

4
data : dataframe, optional
The data to be displayed in this layer. If None, the data from from the ggplot() call is used.
If specified, it overrides the data from the ggplot() call.
stat : str or stat, optional (default: stat_bin)
The statistical transformation to use on the data for this layer. If it is a string, it must be the
registered and known to Plotnine.
position : str or position, optional (default: position_stack)
Position adjustment. If it is a string, it must be registered and known to Plotnine.
na_rm : bool, optional (default: False)
If False, removes missing values with a warning. If True silently removes missing values.
inherit_aes : bool, optional (default: True)
If False, overrides the default aesthetics.
show_legend : bool or dict, optional (default: None)
Whether this layer should be included in the legends. None the default, includes any
aesthetics that are mapped. If a bool, False never includes and True always includes. A dict
can be used to exclude specific aesthetis of the layer from showing in the legend. e.g
show_legend={'color': False}, any other aesthetic are included by default.
raster : bool, optional (default: False)
If True, draw onto this layer a raster (bitmap) object even if the final image is in vector
format.

5
Examples

Histograms
Visualise the distribution of a variable by dividing the x-axis into bins and counting the
number of observations in each bin. Histograms display the counts with bars.
You can define the number of bins (e.g. divide the data five bins) or define the binwidth (e.g.
each bin is size 10).
Distributions can be visualised as: * count, * normalised count, * density, * normalised
density, * scaled density as a percentage.

If you create a basic histogram, you will be prompted to define the binwidth or number of
bins.

6
You can define the width of bins, by specifying the binwidth inside geom_histogarm( ).

7
Or you can define the number of bins by specifying bins inside geom_histogram( ). Note,
the example below uses 10 bins, however you can't see them all because some of the bins
are too small to be noticeable.

There are different ways to visualise the distribution, you can specify this using the y
argument within aes( ). In the example below I'm using the default setting: raw count
with after_stat('count').

8
You can normalise the raw count to 1 by using after_stat('ncount'):

You can display the density of points in a bin, (this is scaled to integrate to 1) by using
after_stat('density'):

9
The proportion of bins can be shown, in the example below the bin=0.5 accounts for
about ~55% of the data:

We can also display counts as percentages by using the percent_format() which requires
the mizani.formatters library

10
Instead of using stat you can use stat_bin defined within geom_histogram() , this is
useful if you want to layer a few different plots in the one figure.

You can also flip the x-y coordinates:

11
You can visualise counts by other variables using fill within aes():

You can visualise too-small-to-see bars by transforming the y-axis scaling by using
scale_y_sqrt() square-root scale or scale_y_log10() for a log-scale (similarly use
scale_x_sqrt() and scale_x_log10() to transform the x-axis).

12
Change the look of your plot:

13
Another change, this time changing the fill colours manually:

14
When faceting histograms with scaled counts/densities, they are normalised by each
facet, and not overall. Here's an example of a facet wrap:

15
Here's an example of a facet grid with the count normalised in each grid:

16
CONCLUSION:

The conclusions drawn from creating a histogram with plotnine (ggplot) depend on the
characteristics of the data and the features highlighted in the visualization. Here are
some general points you might consider:
Distribution Shape:
Assess whether the distribution is symmetric, skewed (left or right), unimodal, or
multimodal. This provides insights into the central tendency of the data.
Central Tendency:
Examine the central tendency of the data by identifying the peak or center of the
histogram. This could be represented by the mean, median, or mode.
Variability:
Evaluate the spread or variability of the data. A wider spread suggests higher variability,
while a narrower spread indicates lower variability.
Outliers:
Identify any outliers that may appear as data points far from the main concentration.
Outliers can impact the interpretation of the distribution.
Patterns and Trends:
Look for any patterns or trends within the histogram that may reveal underlying
structures in the data.
Interpretation:
Provide a meaningful interpretation of the histogram in the context of your analysis or
research question. What do the distribution characteristics suggest about the nature of
the variable.
Considerations for Further Analysis:
Based on the insights gained from the histogram, consider whether further statistical
analysis or exploration is warranted.

17
18

Python Data Analysis and Visualization 100 Practical Exercises With Results and Explanations (Yuka, Horikawa Yui, Kirigaya Kouta Etc.) (Z-Library)
No ratings yet
Python Data Analysis and Visualization 100 Practical Exercises With Results and Explanations (Yuka, Horikawa Yui, Kirigaya Kouta Etc.) (Z-Library)
453 pages
XII IP CH 3 Plotting With Pyplot
No ratings yet
XII IP CH 3 Plotting With Pyplot
52 pages
Class 11 AI
No ratings yet
Class 11 AI
9 pages
From Excel To Machine Learning
100% (1)
From Excel To Machine Learning
48 pages
Data Visualization - Matplotlib PDF
100% (1)
Data Visualization - Matplotlib PDF
15 pages
SPLK 1001 Demo
No ratings yet
SPLK 1001 Demo
5 pages
Data Visualization in R Sem-III 2021 PDF
No ratings yet
Data Visualization in R Sem-III 2021 PDF
57 pages
Data Visulation
No ratings yet
Data Visulation
8 pages
Data Visualization Using Pyplot: Submitted by
No ratings yet
Data Visualization Using Pyplot: Submitted by
27 pages
Data Visualization in Python
No ratings yet
Data Visualization in Python
11 pages
Accenture S&C JD
No ratings yet
Accenture S&C JD
3 pages
Power BI Video Tutorials
100% (1)
Power BI Video Tutorials
20 pages
MODULE-3 Notes
100% (1)
MODULE-3 Notes
4 pages
Unit 8 - Reading Task
100% (1)
Unit 8 - Reading Task
3 pages
Unit 4 Plotting Final
No ratings yet
Unit 4 Plotting Final
51 pages
Data Mining Unit 1 (MSC Ds 3 Sem)
No ratings yet
Data Mining Unit 1 (MSC Ds 3 Sem)
119 pages
Data Visualization
No ratings yet
Data Visualization
17 pages
Python Basic Plot
No ratings yet
Python Basic Plot
43 pages
Data Visualization
No ratings yet
Data Visualization
28 pages
Chapter 2 Tabular and Graphical Methods - Jaggia4e - PPT
No ratings yet
Chapter 2 Tabular and Graphical Methods - Jaggia4e - PPT
44 pages
Itt420 - Group 9
No ratings yet
Itt420 - Group 9
64 pages
IBM Data Science Capstone Project 2022
No ratings yet
IBM Data Science Capstone Project 2022
49 pages
R Graphics Essentials For Great Data Visualization
No ratings yet
R Graphics Essentials For Great Data Visualization
28 pages
Data Visualization
No ratings yet
Data Visualization
48 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
Modul - Data Representation Ver 3.0 - Updated
No ratings yet
Modul - Data Representation Ver 3.0 - Updated
63 pages
Exploratory Data Analysis Course Notes
No ratings yet
Exploratory Data Analysis Course Notes
55 pages
Chapter1.3 - Data Visualization
No ratings yet
Chapter1.3 - Data Visualization
27 pages
BarPlot and Histogram
No ratings yet
BarPlot and Histogram
28 pages
Digital Participation and Collaboration in Architectural Design
No ratings yet
Digital Participation and Collaboration in Architectural Design
30 pages
Lecture3 Matplotlib
No ratings yet
Lecture3 Matplotlib
57 pages
SML Resort Management-3
No ratings yet
SML Resort Management-3
43 pages
Data Visualization
No ratings yet
Data Visualization
27 pages
Matplotlib
No ratings yet
Matplotlib
22 pages
DV Nivas
No ratings yet
DV Nivas
24 pages
Unit 4 (2) Python
No ratings yet
Unit 4 (2) Python
27 pages
Matplotlib Basics
No ratings yet
Matplotlib Basics
27 pages
Module 4-1
No ratings yet
Module 4-1
84 pages
Data Visualization
No ratings yet
Data Visualization
26 pages
Data Visualization
No ratings yet
Data Visualization
18 pages
Data Visualization
No ratings yet
Data Visualization
20 pages
Datascienece
No ratings yet
Datascienece
18 pages
3-Data Description
No ratings yet
3-Data Description
91 pages
Wa0029.
No ratings yet
Wa0029.
16 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
22 pages
B.tech (AI) Syllabus
No ratings yet
B.tech (AI) Syllabus
18 pages
Description of Data Visualization Tools
No ratings yet
Description of Data Visualization Tools
15 pages
MatPlotLib With Python - DATAhill Solutions
No ratings yet
MatPlotLib With Python - DATAhill Solutions
15 pages
Data Sci
No ratings yet
Data Sci
10 pages
Bi Tools - Comparative Study
No ratings yet
Bi Tools - Comparative Study
14 pages
Graphs Using Matplotlib
No ratings yet
Graphs Using Matplotlib
23 pages
Data Visualization
No ratings yet
Data Visualization
18 pages
PythonDASE - 2025 Version1
No ratings yet
PythonDASE - 2025 Version1
44 pages
Be A 65 Ads Exp 2
No ratings yet
Be A 65 Ads Exp 2
10 pages
Unit 5
No ratings yet
Unit 5
10 pages
Unit 5
No ratings yet
Unit 5
25 pages
SOP - Functional Requirements Document
No ratings yet
SOP - Functional Requirements Document
9 pages
PM4 - CIM For Asset Health - Brown - EPRI
No ratings yet
PM4 - CIM For Asset Health - Brown - EPRI
27 pages
Histogram Tools
No ratings yet
Histogram Tools
18 pages
DATA VISUALIZATION - Part 4
No ratings yet
DATA VISUALIZATION - Part 4
12 pages
Data Science UNIT 3
No ratings yet
Data Science UNIT 3
73 pages
Chapter 02 - Fundamentals of Data Visualization
No ratings yet
Chapter 02 - Fundamentals of Data Visualization
39 pages
Python
No ratings yet
Python
29 pages
HCI Unit1
No ratings yet
HCI Unit1
17 pages
Data Unit4
No ratings yet
Data Unit4
8 pages
4HG21CS007
No ratings yet
4HG21CS007
13 pages
A Review of User-Friendly Freely-Available Statistical Analysis Software For Medical Researchers and Biostatisticians
No ratings yet
A Review of User-Friendly Freely-Available Statistical Analysis Software For Medical Researchers and Biostatisticians
13 pages
22MSM40206 Data Visualisation
No ratings yet
22MSM40206 Data Visualisation
13 pages
Chapter 2 - Part 2 - (Histogram)
No ratings yet
Chapter 2 - Part 2 - (Histogram)
18 pages
Notes9 - Class - 10 - Data Visualization Using MatPlotlib Notes
No ratings yet
Notes9 - Class - 10 - Data Visualization Using MatPlotlib Notes
5 pages
Data Visualizations in Python With Matplotlib: Sidita Duli, PHD
No ratings yet
Data Visualizations in Python With Matplotlib: Sidita Duli, PHD
6 pages
Exp 5
No ratings yet
Exp 5
11 pages
C49 DWM Expt4
No ratings yet
C49 DWM Expt4
14 pages
Business Intelligence
No ratings yet
Business Intelligence
11 pages
Plant Detection and Counting Enhancing Precision Agriculture in UAV and General Scenes
No ratings yet
Plant Detection and Counting Enhancing Precision Agriculture in UAV and General Scenes
10 pages
CHAPTER-2 Data Visualization
No ratings yet
CHAPTER-2 Data Visualization
4 pages
Crime Analysis Report - Template
No ratings yet
Crime Analysis Report - Template
6 pages
KEEL: A Data Mining Software Tool Integrating Genetic Fuzzy Systems
No ratings yet
KEEL: A Data Mining Software Tool Integrating Genetic Fuzzy Systems
7 pages
Data Analytics Roadmap Tips
No ratings yet
Data Analytics Roadmap Tips
14 pages
Advanced Plot Types With Matplotlib
No ratings yet
Advanced Plot Types With Matplotlib
8 pages
Stream Paper Ashika MOHIT
No ratings yet
Stream Paper Ashika MOHIT
7 pages
Syllabus 4
No ratings yet
Syllabus 4
6 pages
UNIT 3 Data Science LM 2023
No ratings yet
UNIT 3 Data Science LM 2023
20 pages
42 Histograms2
No ratings yet
42 Histograms2
6 pages
32 Exercise
No ratings yet
32 Exercise
3 pages
Creating and Customizing Advanvced Plots
No ratings yet
Creating and Customizing Advanvced Plots
10 pages
Data Visualization Exp. 3
No ratings yet
Data Visualization Exp. 3
3 pages
Basic Data Storytelling Design Checklist TEMPLATE
No ratings yet
Basic Data Storytelling Design Checklist TEMPLATE
3 pages
Akash Shishodia
No ratings yet
Akash Shishodia
2 pages

Histogram With Plotnine

Uploaded by

Histogram With Plotnine

Uploaded by

Micro Project report on

HISTOGRAM WITH PLOTNINE (GGPLOT)

Submitted to the CMR Institute of Technology in partial fulfilment of the

DATA WRANGLING AND VISUALIZATION LAB

Under the Guidance of

(Assisstant Professor, Dept of CSE(CS))

CMR INSTITUTE OF TECHNOLOGY

(Approved by AICTE, Affiliated to JNTU, Kukatpally, Hyderabad) Kandlakoya,

Medchal road, Hyderabad.

(Approved by AICTE, Affiliated to JNTU, Kukatpally, Hyderabad) Kandlakoya,Medchal

Department of Computer Science and Engineering (Cyber Security)

Signature of faculty Signature of HOD

Mr.Y.Ram Kumar Mr.A.Prakash

(Assistant Professor) (Head Of Department)

We are extremely thankful to our Data Wrangling And Visualization faculty-in-

A histogram is a graphical representation of the distribution of a dataset, displaying

Plotnine, inspired by the ggplot2 library in R, is a Python data visualization package

The primary objectives of creating histograms with plotnine (ggplot) include:

➢ Variable Relationships: Explore how different variables relate to each other

➢ Customization: Leverage the flexibility of plotnine to customize the

➢ Insight Communication: Effectively communicate insights from your data

The bold aesthetics are required.

You can also flip the x-y coordinates:

You might also like