0% found this document useful (0 votes)
82 views4 pages

Submitted By-Pawan Yadav, Roll No. (18PT1-17)

The document discusses the ggplot2 package in R for creating data visualizations. It describes the key concepts of ggplot2 including aesthetic mappings which link data variables to visual properties, different geometric objects for plotting like points and lines, and how to create a basic ggplot. It also discusses other visualization libraries in R for creating interactive plots.

Uploaded by

GAURAV YADAV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views4 pages

Submitted By-Pawan Yadav, Roll No. (18PT1-17)

The document discusses the ggplot2 package in R for creating data visualizations. It describes the key concepts of ggplot2 including aesthetic mappings which link data variables to visual properties, different geometric objects for plotting like points and lines, and how to create a basic ggplot. It also discusses other visualization libraries in R for creating interactive plots.

Uploaded by

GAURAV YADAV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Submitted by- Pawan Yadav, Roll No.

(18PT1-17)

1. Study the ggplot function within R

Just as the grammar of language helps us construct meaningful sentences out of words, the Grammar of
Graphics helps us to construct graphical figures out of different visual elements. This grammar gives us a
way to talk about parts of a plot: all the circles, lines, arrows, and words that are combined into a diagram
for visualizing data. Originally developed by Leland Wilkinson, the Grammar of Graphics was adapted
by Hadley Wickham to describe the components of a plot, including

 the data being plotted
 the geometric objects (circles, lines, etc.) that appear on the plot
 a set of mappings from variables in the data to the aesthetics (appearance) of the geometric
objects
 a statistical transformation used to calculate the data values used in the plot
 a position adjustment for locating each geometric object on the plot
 a scale (e.g., range of values) for each aesthetic mapping used
 a coordinate system used to organize the geometric objects
 the facets or groups of data shown in different plots

Wickham further organizes these components into layers, where each layer has a single geometric object,
statistical transformation, and position adjustment. Following this grammar, you can think of each plot as
a set of layers of images, where each image’s appearance is based on some aspect of the data set.

All together, this grammar enables us to discuss what plots look like using a standard set of vocabulary.
And similar to how tidyr and dplyr provide efficient data transformation and manipulation, ggplot2
provides more efficient ways to create specific visual images.

In order to create a plot, you:

1. Call the ggplot() function which creates a blank canvas


2. Specify aesthetic mappings, which specifies how you want to map variables to visual aspects. In
this case we are simply mapping the displ and hwy variables to the x- and y-axes.
3. You then add new layers that are geometric objects which will show up on the plot. In this case
we add geom_point to add a layer with points (dot) elements as the geometric shapes to represent
the data.

# create canvas
ggplot(mpg)

# variables of interest mapped


ggplot(mpg, aes(x = displ, y = hwy))

# data plotted
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point()
Submitted by- Pawan Yadav, Roll No. (18PT1-17)

Aesthetic Mappings-
The aesthetic mappings take properties of the data and use them to influence visual characteristics, such
as position, color, size, shape, or transparency. Each visual characteristic can thus encode an aspect of the
data and be used to convey information.

All aesthetics for a plot are specified in the aes() function call (later in this tutorial you will see that
each geom layer can have its own aes specification). For example, we can add a mapping from the class
of the cars to a color characteristic:

ggplot(mpg, aes(x = displ, y = hwy, color = class)) +


geom_point()

Specifying Geometric Shapes


Building on these basics, ggplot2 can be used to build almost any kind of plot you may want. These plots
are declared using functions that follow from the Grammar of Graphics.

The most obvious distinction between plots is what geometric objects (geoms) they


include. ggplot2 supports a number of different types of geoms, including:

 geom_point for drawing individual points (e.g., a scatter plot)


 geom_line for drawing lines (e.g., for a line charts)
 geom_smooth for drawing smoothed lines (e.g., for simple trends or approximations)
 geom_bar for drawing bars (e.g., for bar charts)
 geom_histogram for drawing binned values (e.g. a histogram)
 geom_polygon for drawing arbitrary shapes
 geom_map for drawing polygons in the shape of a map! (You can access the data to use for these
maps by using the map_data() function).

Each of these geometries will leverage the aesthetic mappings supplied although the specific visual
properties that the data will map to will vary.

Some other functions are -


Submitted by- Pawan Yadav, Roll No. (18PT1-17)

 Statistical Transformations

 Position Adjustments

 Managing Scales

 Coordinate Systems

 Facets

 Labels & Annotations

 Other Visualization Libraries - ggplot2 is easily the most popular library for producing data
visualizations in R. That said, ggplot2 is used to produce static visualizations: unchanging “pictures” of
plots. Static plots are great for for explanatory visualizations: visualizations that are used to
communicate some information—or more commonly, an argument about that information. All of the
above visualizations have been ways for us to explain and demonstrate an argument about the data (e.g.,
the relationship between car engines and fuel efficiency). Data visualizations can also be highly effective
for exploratory analysis, in which the visualization is used as a way to ask and answer questions about
the data (rather than to convey an answer or argument). While it is perfectly feasible to do such
exploration on a static visualization, many explorations can be better served with interactive
visualizations in which the user can select and change the view and presentation of that data in order to
understand it.
While ggplot2 does not directly support interactive visualizations, there are a number of additional R
libraries that provide this functionality, including:

 ggvis is a library that uses the Grammar of Graphics (similar to ggplot), but for interactive
visualizations.
 plotly is a open-source library for developing interactive visualizations. It provides a number of
“standard” interactions (pop-up labels, drag to pan, select to zoom, etc) automatically. Moreover,
it is possible to take a ggplot2 plot and wrap it in Plotly in order to make it interactive. Plotly has
many examples to learn from, though a less effective set of documentation.
 htmlwidgets provides a way to utilize a number of JavaScript interactive visualization libraries.
JavaScript is the programming language used to create interactive websites (HTML files), and so
is highly specialized for creating interactive experiences.

2. Run word count on H G Wells collection and plot the same

3. Study the “tm” package and its usage for all possible operations within word processing and
sentiment analysis?

The tm package was created by Ingo Feinerer and enables novice researchers (like me) to harness the
power of R without an in-depth understanding of the programming language. With this understanding
in mind, let’s explore some of the practical applications of the tm package.
Submitted by- Pawan Yadav, Roll No. (18PT1-17)

The tm package utilizes the Corpus as its main structure. A corpus is simply a collection of documents,
but like most things in R, the corpus has specific attributes that enable certain types of analysis. Corpora
in R exist in two ways:

 Volitile Corpus (VCorpus) is a temporary object within R and is the default when assigning


documents to a corpus.
 Permanent Corpus (PCorpus) is a permanent object that can be stored outside of R.

Compared to the volatile corpus the corpus encapsulated by a permanent corpus object is not
destroyed if the corresponding R object is released. Within the corpus constructor, x must be a Source
object which abstracts the input location. tm provides a set of predefined sources, e.g., DirSource,
VectorSource, or DataframeSource, which handle a directory, a vector interpreting each component
as document, or data frame like structures (like CSV files), respectively. Except DirSource, which is
designed solely for directories on a file system, and VectorSource, which only accepts (character)
vectors, most other implemented sources can take connections as input (a character string is
interpreted as file path). getSources() lists available sources, and users can create their own sources.
The second argument readerControl of the corpus constructor has to be a list with the named
components reader and language. The first component reader constructs a text document from
elements delivered by a source. The tm package ships with several readers (e.g., readPlain(),
readPDF(), readDOC(), . . . ). See getReaders() for an up-to-date list of available readers. Each source
has a default reader which can be overridden. E.g., for DirSource the default just reads in the input
files and interprets their content as text. Finally, the second component language sets the texts’
language (preferably using ISO 639-2 codes). In case of a permanent corpus, a third argument
dbControl has to be a list with the named components dbName giving the filename holding the
sourced out objects (i.e., the database), and dbType holding a valid database type as supported by
package filehash. Activated database support reduces the memory demand, however, access gets
slower since each operation is limited by the hard disk’s read and write capabilities.

Some of the key features of tm package are :-

Corpus Transformations
One of the best features of the tm package is the ability to transform text into workable data without a
great deal of code. To do this, we can use Transformations which are available in the tm package. To see
available Transformations enter getTransformations() in the console.
Data Import
Data Export
Inspecting Corpora
Filters
Metadata Management
Creating Term-Document Matrices

4. Eexamine case study on Garrettgman

5. Convert transcript into a table for 'Mann ke Baat'

You might also like