R Programming Paper Solutions
R Programming Paper Solutions
Page 3 of 43
Lists are just like vectors, only they don’t have the limitation of being able to hold elements
of the same type exclusively They are built with the list function or with the c function if
one of the elements you’re adding is a list.
> myList list(5, "Hello", "Worlds", TRUE)
> myList
> class(myList )
[1] "list"
> myList
[[1]] [1] 5 [[2]] [1] "Hello" [[3]] [1] "Worlds" [[4]] [1] TRUE
2. What is Vector in R?
Ans:- A vector is a sequence of data elements of the same basic type Members in a vector
are officially called components.
Here is a vector containing three numeric values 2 3 and 5
c(2, 3, 5)
235
3. What is Dataset?
Ans:- In the text below, whenever I mention using a command, assume that this means
punching it into the console. So, if I say “We look at the help for data frames with ?
data.frame “, you do this:
RStudio comes with some datasets for new users to play around with To use a built in
dataset, we load it with the data function, and supply an argument corresponding to the set
we want To see all the available built in sets, punch in data() without an argument Looking
at the list of available datasets, let’s load a very small one for starters
data("women")
Or
Women
You should see the women variable appear in the Environment
panel, though its second field says < A promise in this
case merely means “The data will be there when you actually need it” We told R to load this
set, but we haven’t actually used it anywhere, so it didn’t feel the need to load it fully into
memory Let’s tell R we need it In the console, print out the entire set by simply calling
women
Page 4 of 43
This is equivalent to
print(women)
The numbers will be produced in the console, and the Environment entry for women should
change You should be able to see the data in the environment panel now, too, by clicking
the blue expand arrow next to the variable name.
This set only has 15 entries and as such offers nothing of value, but it’s good enough for
playing around in
To further study the set you’re dealing with, there are several functions to keep in mind
(demonstration of each can be seen below explanations)
nrow ncol will list the number of rows columns respectively
4. List out operators.
Ans:- Types of the operator in R language:-
Arithmetic Operators
Logical Operators
Relational Operators
Assignment Operators
Miscellaneous Operator
5. Explain Relational operator in R.
Ans:- The relational operators in R carry out comparison operations between the
corresponding elements of the operands. Returns a boolean TRUE value if the first operand
satisfies the relation compared to the second. A TRUE value is always considered to be
greater than the FALSE.
Less than (<)
Returns TRUE if the corresponding element of the first operand is less than that of the
second operand. Else returns FALSE.
Less than equal to (<=)
Returns TRUE if the corresponding element of the first operand is less than or equal to that
of the second operand. Else returns FALSE.
Greater than (>)
Returns TRUE if the corresponding element of the first operand is greater than that of the
second operand. Else returns FALSE.
Greater than equal to (>=)
Page 5 of 43
Returns TRUE if the corresponding element of the first operand is greater or equal to that
of the second operand. Else returns FALSE.
Not equal to (!=)
Returns TRUE if the corresponding element of the first operand is not equal to the second
operand. Else returns FALSE.
6. Explain Data Frame in R.
Ans:- Data frame is a two dimensional data structure in R It is a special case of a list which
has each component of equal length Each component form the column and contents of the
component form the rows Creating Data Frames We can create a data frame using the data
frame function
For example, the above shown data frame can be created as
follows
x < data.frame ("SN" = 1:2, "Age" = c(21,15), "Name" =
c(" John","Dora
str (x) # structure of x
O/P
data.frame ': 2 obs. of 3
$ SN : int 1 2
$ Age : num 21 15
$ Name: Factor w/ 2 levels Dora","John ": 2 1
Notice above that the third column, Name is of type factor, instead of a character vector By
default, data frame function converts character vector into factor To suppress this behavior
we can pass the argument.
stringsAsFactors =FALSE
x data frame (" 1 2 ,," c( 21 15 ),),"
c("John", John"," stringsAsFactors FALSE)
str ( now the third column is a character vector
O/P
data.frame ': 2 obs. of 3
$ SN : int 1 2
$ Age : num 21 15
Page 6 of 43
$ Name: chr "John" "
7. What is Vector Element Sorting? Explain.
Ans:- Elements in a vector can be sorted using the sort() function.
v <- c(3,8,4,5,0,11, -9, 304)
# Sort the elements of the vector.
sort.result <- sort(v)
print(sort.result)
# Sort the elements in the reverse order.
revsort.result <- sort(v, decreasing = TRUE)
print(revsort.result)
# Sorting character vectors.
v <- c("Red","Blue","yellow","violet")
sort.result <- sort(v)
print(sort.result)
# Sorting character vectors in reverse order.
revsort.result <- sort(v, decreasing = TRUE)
print(revsort.result)
8. difference between structured and unstructured data.
Ans:-Structured data:- The data which can be co related with the relationship keys, in a
geeky word, RDBMS data! Maximum processing is happening on this type of data even
today but then it constitutes around 5% of the total digital data!
Unstructured data:- All the remaining data having no structure at all, falls into this category
and according to IDC estimate, it represents whopping 90% in share. Examples could be
sensor data (huge in percentage), social media streams, images, videos, mobile data, etc.
9. Explain Characteristics of data
Ans:- There are five data characteristics that are the building blocks of an efficient data
analytics solution Accuracy, Completeness, Consistency, Uniqueness, and Timeliness
Understanding each of these will help us in understanding why different businesses are not
able to leverage the benefits of data analytics in the same ratio
Accuracy:-When they are insights extracted from a well developed and well tested data
analytics solution, we are assuming that the data is reliable and accurate However, flaws in
Page 7 of 43
data collection, data storage, or data retrieving will result in unreliable data and this will
reduce the accuracy of the insights extracted by a data analytics solution.
Completeness:-The insights or information extracted by a data analytics solution depends a
great deal on the completeness of the data Partial data or a dataset with lot of missing
values represents an incomplete picture Thus, the degree of completeness of a data
determines the accuracy of a data analytics solution
Consistency:-The consistency within a dataset is another important factor that determines
the degree of accuracy of a data analytics solution A consistent dataset is less prone to
errors and results in better accuracy of a data analytics solution.
Uniqueness:-One of the essential components of any business is high quality data This data,
if used properly, can make a company competitive or can keep a company competitive
Thus, the degree of uniqueness of data explains the efficiency of a data analytics solution In
order to add value to any business, the data should be unique and distinctive.
Timeliness:-A data analytics solution that uses out dated data can restrict a company from
achieving their goals or from surviving in a competitive arena New and current data is more
valuable to a business than old out dated data Though old data should not be completely
over looked by a data analytics solution, but emphasis should be placed on the current
data.
10. describe basic layout of R studio.
Ans:- We’ll be using RStudio a free, open source R integrated development environment It
provides a built in editor, works on all platforms (including on servers) and provides many
advantages such as integration with version control and project management.
When you first open RStudio , you will be greeted by Four
R Script editor (upper left)
The interactive R console (lower left)
Environment/History (upper right)
Files/Plots/Packages/Help/Viewer (lower right)
Page 8 of 43
Chapter-2
1. Explain Repeat loop in R
Ans:- Repeat loop Executes a sequence of statements multiple times and abbreviates the
code that manages the loop variable.
The Repeat loop executes the same code again and again until a stop
condition is met.
Syntax
repeat {commands
if(condition) {
break } }
Example
x <- 1
repeat { print(x)
x = x+1
if (x == 6){break} }
2. Explain seq () function.
Ans:- seq():-function in R Language is used to create a sequence of elements in a Vector. It
takes the length and difference between values as optional argument.
Syntax:
seq(from, to, by, length.out)
Parameters:
from: Starting element of the sequence
to: Ending element of the sequence
by: Difference between the elements
length.out: Maximum length of the vector
3. Explain format () function with example.
Ans:- Numbers and strings can be formatted to a specific style using format() function.
Syntax
The basic syntax for format function is −
format(x, digits, nsmall, scientific, width, justify = c("left", "right", "centre", "none"))
Page 9 of 43
Following is the description of the parameters used −
x is the vector input.
digits is the total number of digits displayed.
nsmall is the minimum number of digits to the right of the decimal point.
scientific is set to TRUE to display scientific notation.
width indicates the minimum width to be displayed by padding blanks in the beginning.
justify is the display of the string to left, right or center.
4. Explain if and else.
Ans:- The simplest form of f low control is conditional execution using if. If takes a logical
value (more precisely, a logical vector of length one) and executes the next statement only
if that value is TRUE:
if(TRUE) message("It was true!")
## It was true!
if(FALSE) message("It wasn't true!")
Missing values aren’t allowed to be passed to if; doing so throws
an error:
if(NA) message("Who knows if it was true?")
## Error: missing value where TRUE/FALSE needed
Where you may have a missing value, you should test for it
using is.na:
if(is.na(NA)) message("The value is missing!")
## The value is missing!
Of course, most of the time, you won’t be passing the actual values TRUE or FALSE.
Instead you’ll be passing a variable or expression—if you knew that the statement was
going to be executed in advance, you wouldn’t need the if clause. In this next example,
runif(1) generates one uniformly distributed random number between 0 and 1. If that value
is more than 0.5, then the message is displayed:
if(runif(1) > 0.5) message("This message appears with a 50% chance.")
If you want to conditionally execute several statements, you can wrap them in curly
braces:
x <- 3
Page 10 of 43
if(x > 2)
{y <- 2 * x
z <- 3 * y}
For clarity of code, some style guides recommend always using curly braces, even if you
only want to conditionally execute one statement.
The next step up in complexity from if is to include an else statement.
Code that follows an else statement is executed if the if condition was FALSE:
if(FALSE)
{message("This won't execute...") } else
{ message("but this will.") }
## but this will.
One important thing to remember is that the else statement must
occur on the same line as the closing curly brace from the if clause. If
you move it to the next line, you’ll get an error:
if(FALSE)
{message("This won't execute...") }
else{message("and you'll get an error before you reach this.") }
## Error: unexpected 'else' in "else“
Multiple conditions can be defined by combining if and else repeatedly. Notice that if and
else remain two separate words— there is an if else function but it means something
slightly
different, as we’ll see in a moment:
X <- 4
if(is.nan(x)) # NaN (Not a Number , 0 / 0 = NaN)
{message("x is missing")}
else if(is.infinite(x)) # pi / 0= Inf a number divided by zero creates
infinity
{message("x is infinite")}
else if(x > 0)
{message("x is positive") } else if(x < 0)
Page 11 of 43
{message("x is negative")} else
{message("x is zero")}
## x is positive
5. How to delete created variable?
Ans:- Variables can be deleted by using the rm() function. Below we delete the variable
var.3. On printing the value of the variable error is thrown.
All the variables can be deleted by using the rm() and ls() function together.
rm(list = ls())
print(ls())
When we execute the above code, it produces the following result −
character(0)
6. Explain sum(), mean() functions with example.
Ans:- sum():-function in R Programming Language returns the addition of the values passed
as arguments to the function.
Example:-
a1=c(12,13)
sum(a1)
Mean():-It is calculated by taking the sum of the values and dividing with the number of
values in a data series.
The function mean() is used to calculate this in R.
Example:-
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find Mean.
result.mean <- mean(x)
print(result.mean)
7. Explain Variable scope.
Ans:- The location where we can find a variable and also access it if required is called the
scope of a variable. There are mainly two types of variable scopes:
Page 12 of 43
Global Variables: Global variables are those variables that exist throughout the execution of
a program. It can be changed and accessed from any part of the program.
As the name suggests, Global Variables can be accessed from any part of the program.
They are available throughout the lifetime of a program.
They are declared anywhere in the program outside all of the functions or blocks.
Declaring global variables: Global variables are usually declared outside of all of the
functions and blocks. They can be accessed from any portion of the program.
Local Variables: Local variables are those variables that exist only within a certain part of a
program like a function and are released when the function call ends.
Variables defined within a function or block are said to be local to those functions.
Local variables do not exist outside the block in which they are declared, i.e. they can not
be accessed or used outside that block.
Declaring local variables: Local variables are declared inside a block.
8. Explain R switch statement.
Ans:- The following rules apply to a switch statement −
If the value of expression is not a character string it is coerced to integer.
You can have any number of case statements within a switch. Each case is followed by the
value to be compared to and a colon.
If the value of the integer is between 1 and nargs()−1 (The max number of arguments)then
the corresponding element of case condition is evaluated and the result returned.
If expression evaluates to a character string then that string is matched (exactly) to the
names of the elements.
If there is more than one match, the first matching element is returned.
No Default argument is available.
In the case of no match, if there is a unnamed element of ... its value is returned. (If there is
more than one such argument an error is returned.)
Example
x <- switch(3, "first", "second", "third", "fourth" )
print(x)
9. Explain plyr package.
Page 13 of 43
Ans:- plyr is a set of tools for a common set of problems: you need to split up a big data
structure into homogeneous pieces, apply a function to each piece and then combine all
the results back together. For example, you might want to:
fit the same model to subsets of a data frame
quickly calculate summary statistics for each group
perform group-wise transformations like scaling or standardising
It’s already possible to do this with base R functions (like split and the apply family of
functions), but plyr makes it all a bit easier with:
totally consistent names, arguments and outputs
convenient parallelisation through the foreach package
input from and output to data.frames, matrices and lists
progress bars to keep track of long running operations
built-in error recovery, and informative error messages
labels that are maintained across all transformations
Considerable effort has been put into making plyr fast and memory efficient, and in many
cases plyr is as fast as, or faster than, the built-in functions.
10. List out all apply (). Explain any three.
Ans:- The apply() function is the basic model of the family of apply functions in R, which
includes specific functions like lapply() , sapply() , tapply() , mapply() , vapply() , rapply() ,
bapply() , eapply() , and others.
apply() function:-The apply() function lets us apply a function to the rows or columns of a
matrix or data frame. This function takes matrix or data frame as an argument along with
function and whether it has to be applied by row or column and returns the result in the
form of a vector or array or list of values obtained.
Syntax: apply( x, margin, function )
Parameters:
x: determines the input array including matrix.
margin: If the margin is 1 function is applied across row, if the margin is 2 it is applied across
the column.
function: determines the function that is to be applied on input data.
lapply() function:-The lapply() function helps us in applying functions on list objects and
returns a list object of the same length. The lapply() function in the R Language takes a list,
vector, or data frame as input and gives output in the form of a list object. Since the lapply()
Page 14 of 43
function applies a certain operation to all the elements of the list it doesn’t need a
MARGIN.
Syntax: lapply( x, fun )
Parameters:
x: determines the input vector or an object.
fun: determines the function that is to be applied to input data.
sapply() function:-The sapply() function helps us in applying functions on a list, vector, or
data frame and returns an array or matrix object of the same length. The sapply() function
in the R Language takes a list, vector, or data frame as input and gives output in the form of
an array or matrix object. Since the sapply() function applies a certain operation to all the
elements of the object it doesn’t need a MARGIN. It is the same as lapply() with the only
difference being the type of return object.
Syntax: sapply( x, fun)
Parameters:
x: determines the input vector or an object.
fun: determines the function that is to be applied to input data.
Page 15 of 43
A loop statement allows us to execute a statement or group of statements multiple times
and the following is the general form of a loop statement in most of the programming
languages –
1.repeat loop:-Executes a sequence of statements multiple times and abbreviates the code
that manages the loop variable.
2.while loop:-Repeats a statement or group of statements while a given condition is true. It
tests the condition before executing the loop body.
3.for loop:-Like a while statement, except that it tests the condition at the end of the loop
body.
13. How to splitting strings in R?
Ans:- strsplit():-we are using strsplit() along with delimiter, delimiter is a character of an
existing string to being removed from the string and display out.
# Given String
gfg < - "Geeks For Geeks"
# Using strsplit() method
answer < - strsplit(gfg, " ")
print(answer)
14. Explain Types of loops in R with example.
Ans:- 1.repeat loop:-Executes a sequence of statements multiple times and abbreviates the
code that manages the loop variable.
Example:-
v <- c("Hello","loop")
cnt <- 2
repeat { print(v)
Page 16 of 43
cnt <- cnt+1
if(cnt > 5) { break } }
2.while loop:-Repeats a statement or group of statements while a given condition is true. It
tests the condition before executing the loop body.
Example:-
v <- c("Hello","while loop")
cnt <- 2
while (cnt < 7) { print(v)
cnt = cnt + 1 }
3.for loop:-Like a while statement, except that it tests the condition at the end of the loop
body.
Example:-
v <- LETTERS[1:4]
for ( i in v) { print(i) }
Chapter-3
1. What is the package? How to load the package?
Ans:- R packages are a collection of R functions, complied code and sample data.
They are stored under a directory called "library" in the R environment.
By default, R installs a set of packages during installation.
More packages are added later, when they are needed for some specific purpose.
When we start the R console, only the default packages are available by default.
Other packages which are already installed have to be loaded explicitly to be used by the R
program that is going to use them.
Before a package can be used in the code, it must be loaded to the current R environment.
You also need to load a package that is already installed previously but not available in the
current environment.
To load a package that is already installed on your machine, you call the library function.
It is widely agreed that calling this function library was a mistake, and that calling it
load_package would have saved a lot of confusion, but the function has existed long
enough that it is too late to change it now.
Page 17 of 43
To clarify the terminology, a package is a collection of R functions and datasets, and a
library is a folder on your machine that stores the files for a package.
If you have a standard version of R that is, you haven’t built some custom version from the
source code the lattice package should be installed, but it won’t automatically be loaded
We can load it with the library function.
library(lattice)
We can now use all the functions provided by lattice For example, Figure displays a fancy
dot plot of the famous Immer’s barley dataset.
Dotplot(
variety ~ yield | site,
data = barley,
groups = year
)
2. How to maintain packages in R?
Ans:- After your packages are installed, you will usually want to update them in order to
keep up with the latest versions This is done with update packages
By default, this function will prompt you before updating each package This can become
unwieldy after a while (having several hundred packages installed is not uncommon), so
setting ask FALSE is recommended
update.packages (ask = FALSE)
Very occasionally, you may want to delete a package It is possible to do this by simply
deleting the folder containing the package contents from your filesystem or you can do it
programmatically.
remove.packages("zoo")
3. Explain search path.
Ans:- You can see the packages that are loaded with the search
function:
search()
## [1] GlobalEnv " package:stats " package:graphics
## [4] package:grDevices " package:utils " package:datasets
## [7] package:methods " Autoloads " package:base
Page 18 of 43
This list shows the order of places that R will look to try to find a variable The global
environment always comes first, followed by the most recently loaded packages
The last two values are always a special environment called Autoloads and the base
package
If you define a variable called var in the global environment, R will find that before it finds
the usual variance function in the stats package, because the global environment comes
first in the search list If you create any environments they will also appear on the search
path.
4. List format of date explain any three of them.
Ans:-
5. What is the use of package?
Ans:-
6. What is UTC?
Ans:- UTC stands for Coordinated Universal Time, a standard used to set all time zones
around the world.
So, for instance, New York City is in the time zone UTC minus five, meaning that it is 5 hours
earlier in NYC than the reading on a UTC clock (except during U.S. daylight savings, when it
is 4 hours earlier)
7. Explain Intervals, Duration and Periods.
Ans:-
8. Explain Date Class with Example.
Ans:- The third date class in base R is slightly better named it is the Date class This stores
dates as the number of days since the start of 1970.
The Date class is best used when you don’t care about the time of day Fractional days are
possible (and can be generated by calculating a mean Date, for example), but the POSIX
classes are better for those situations.
now_date as.Date now_ct
## [1] "2017
09 13"
class( now_date
## [1] "Date"
unclass now_date
Page 19 of 43
## [1] 17422
9. Explain Time Zone.
Ans:- Time zones are horrible, complicated things from a programming perspective
Countries often have several, and change the boundaries when some (but not all) switch to
daylight savings time Many time zones have abbreviated names, but they often aren’t
unique For example,example,“ can refer to “Eastern Standard Time” in the United States,
Canada, or Australia.
You can specify a time zone when parsing a date string (with strptime and change it again
when you format it (with strftime During parsing, if you don’t specify a time zone (the
default is R will give the dates a default time zone This is the value returned by Sys
timezone which is in turn guessed from your operating system locale settings You can see
the OS date time settings with Sys getlocale ("LC_TIME”).
The easiest way to avoid the time zone mess is to always record and then analyze your
times in the UTC zone
10. Explain POSIX class in detail.
Ans:- POSIX dates and times are classic R brilliantly thorough in their implementation,
navigating all sorts of obscure technical issues, but with awful Unixy names that make
everything seem more complicated than it really is.
The two standard date time classes in R are POSIXct and POSIXlt ..(I said the names were
awful!) POSIX is a set of standards that defines compliance with Unix, including how dates
and times should be specified
ct is short for “calendar time,” and the POSIXct class stores dates as the number of seconds
since the start of 1970 in the Coordinated Universal Time (UTC) zone.
POSIXlt stores dates as a list, with components for seconds, minutes, hours, day of month,
etc POSIXct is best for storing dates and calculating with them, whereas POSIXlt is best for
extracting specific parts of a date
The function Sys time returns the current date and time in
POSIXct form
now_ct Sys.time
## "2017 09 13 13:32:42 IST"
The class of now_ct has two elements It is a POSIXct variable, and POSIXct is inherited from
the class POSIXt
class( now_ct)
## [1] POSIXct " POSIXt
Page 20 of 43
When a date is printed, you just see a formatted version of it, so it isn’t obvious how the
date is stored By using unclass we can see that it is indeed just a number.
11. how to work arithmetic with date and time?
Ans:- R supports arithmetic with each of the three base classes Adding a number to a POSIX
date shifts it by that many seconds Adding a number to a Date shifts it by that many days
now_ct + 86400 #Tomorrow. I wonder what the world will be.
## [1] "2013
07 18 22:47:01 BST"
now_lt + 86400 #Same behavior for POSIXlt
## [1] "2013
07 18 22:47:01 BST"
now_date + 1 #Date arithmetic is in days
## [1] "2013
07 18“
Adding two dates together doesn’t make much sense, and throws an error Subtraction is
supported, and calculates the difference between the two dates The behaviour is the same
for all three date types
In the following example, note that as Date will automatically parse dates of the form Y m d
or Y/ m/ d, if you don’t specify a format
Chapter-4
1. How to write data into .CSV file?
Ans:- R can create csv file form existing data frame The write csv() function
is used to create the csv file This file gets created in the working
directory
# Create a data
data <-read.csv("input.csv")
retval<-subset(data, as.Date start_date ) > as.Date ("01 02 2014"))
# Write filtered data into a new
write.csv(retval,"output.csv”)
newdata<-read.csv("output.csv")
Page 21 of 43
print(newdata)
2. How to read JSON file in R programming?
Ans:- The xml file is read by R using the function xmlParse(). It is stored as a list in R.
# Load the package required to read XML
library("XML")
# Also load the other required
library("methods")
# Give the input file name to the
result <-xmlParse(file = "input.xml”)
# Print the result
print(result)
3. How to Creating the Scatterplot in R programming?
Ans:- Scatterplots show many points plotted in the Cartesian plane Each point represents
the values of two variables One variable is chosen in the horizontal axis and another in the
vertical axis.
The simple scatterplot is created using the plot() function
Syntax
plot(x, y, main, xlab , ylab , xlim , ylim , axes)
Following is the description of the parameters used −
x is the data set whose values are the horizontal coordinates.
y is the data set whose values are the vertical coordinates.
main is the tile of the graph.
xlab is the label in the horizontal axis.
ylab is the label in the vertical axis.
xlim is the limits of the values of x used for plotting.
ylim is the limits of the values of y used for plotting.
axes indicates whether both axes should be drawn on the plot.
4. Explain pie Charts.
Page 22 of 43
Ans:- R Programming language has numerous libraries to create charts and graphs. A pie-
chart is a representation of values as slices of a circle with different colors. The slices are
labeled and the numbers corresponding to each slice is also represented in the chart.
In R the pie chart is created using the pie() function which takes positive numbers as a
vector input. The additional parameters are used to control labels, color, title etc.
Syntax
pie(x, labels, radius, main, col, clockwise)
Following is the description of the parameters used −
x is a vector containing the numeric values used in the pie chart.
labels is used to give description to the slices.
radius indicates the radius of the circle of the pie chart.(value between −1 and +1).
main indicates the title of the chart.
col indicates the color palette.
clockwise is a logical value indicating if the slices are drawn clockwise or anti clockwise.
5. Give the syntax of boxplots.
Ans:- Syntax
boxplot (x, data, notch, varwidth , names,main)
Following is the description of the parameters used −
x is a vector or a formula.
data is the data frame.
notch is a logical value. Set as TRUE to draw a notch.
varwidth is a logical value. Set as true to draw width of the box proportionate to the sample
size.
names are the group labels which will be printed under each boxplot.
main is used to give a title to the graph.
6. Give syntax of Scatterplots.
Ans:- Syntax:-
plot(x, y, main, xlab , ylab , xlim , ylim ,axes)
Following is the description of the parameters used −
x is the data set whose values are the horizontal coordinates.
Page 23 of 43
y is the data set whose values are the vertical coordinates.
main is the tile of the graph.
xlab is the label in the horizontal axis.
7. Explain Line Charts with example.
Ans:- A line chart is a graph that connects a series of points by drawing line segments
between them These points are ordered in one of their coordinate (usually the x
coordinate) value Line charts are usually used in identifying the trends in data.
The plot() function in R is used to create the line graph.
Syntax
plot(v, type, col , xlab , ylab)
Following is the description of the parameters used −
v is a vector containing the numeric values.
type takes the value "p" to draw only the points, "l" to draw only the
lines and "o" to draw both points and lines.
xlab is the label for x axis.
ylab is the label for y axis.
main is the Title of the chart.
col is used to give colors to both the points and lines.
Example:-
A simple line chart is created using the input vector and the type parameter as " O”(not
zero) The below script will create and save a line chart in the current R working directory
# Create the data for the chart.
v <-c(7,12,28,3,41)
plot(v,type = "0”)
8. Explain stem and leaf.
Ans:- Stem and Leaf plot is a technique of displaying the frequencies with which some
classes of values may occur.
It is basically a method of representing the quantitative data in the graphical format.
The stem and leaf plot retains the original data item up to two significant figures unlike
histogram.
Page 24 of 43
The data is put in order which eases the move to no parametric statistics and order-based
inference. Let us understand how this plotting technique works.
In R, stem and leaf plots(also known as stem and leaf diagrams) of any quantitative
variable, say x, is a textual graph that is used to classify the data items in order of their most
significant numeric digits.
The term stem and leaf is so because the plot is given in a tabular format where each
numeric value or data item is split into a stem i.e. the first digit and a leaf i.e. the last digit.
For example, suppose the input data is 94. Then 9 will be the stem and 4 will be the leaf.
Example:
On World’s Obesity Day, suppose in a school a teacher decided to measure the weight of
any 10 students whom she feels may have obesity. So she records the weight of 10 students
as follows:
54, 43, 67, 76, 45, 59, 66, 78, 80, 92.
9. Explain read.csv and write.csv.
Ans:- Read.csv:- Following is a simple example of read.csv() function to read a CSV file
available in your current working directory.
data <- read.csv("input.csv")
print(data)
Analyzing
Write.csv:-R can create csv file form existing data frame The write csv() function is used to
create the csv file This file gets created in the working directory.
# Create a data
data <-read.csv("input.csv")
retval<-subset(data, as.Date start_date ) > as.Date ("01 02 2014"))
# Write filtered data into a new
write.csv(retval,"output.csv”)
newdata
read.csv("output.csv")
print(newdata)
10.Explain scatter plots in R.
Page 25 of 43
Ans:- Scatterplots show many points plotted in the Cartesian plane Each point represents
the values of two variables One variable is chosen in the horizontal axis and another in the
vertical axis. The simple scatterplot is created using the plot() function
Syntax
plot(x, y, main, xlab , ylab , xlim , ylim ,axes)
Following is the description of the parameters used −
x is the data set whose values are the horizontal coordinates.
y is the data set whose values are the vertical coordinates.
main is the tile of the graph.
xlab is the label in the horizontal axis.
ylab is the label in the vertical axis.
xlim is the limits of the values of x used for plotting.
ylim is the limits of the values of y used for plotting.
axes indicates whether both axes should be drawn on the plot.
11.Explain 3d pie chart
Ans:- A pie chart with 3 dimensions can be drawn using additional packages. The package
plotrix has a function called pie3D() that is used for this.
# Get the library.
library(plotrix)
# Create data for the graph.
x <- c(21, 62, 10,53)
lbl <- c("London","New York","Singapore","Mumbai")
# Plot the chart.
pie3D(x, labels = lbl, explode = 0.1, main = "Pie Chart of Countries ")
12.How to get data in R using XML files?
Ans:- The xml file is read by R using the function xmlParse ()(). It is stored as a list in R.
# Load the package required to read XML
library("XML")
# Also load the other required
library("methods")
Page 26 of 43
# Give the input file name to the
result <- xmlParse (file = "input.xml”)
# Print the
print(result)
13.Explain barchart in R.
Ans:- A bar chart represents data in rectangular bars with length of the bar proportional to
the value of the variable R uses the function barplot to create bar charts R can draw both
vertical and horizontal bars in the bar chart In bar chart each of the bars can be given
different colors
Syntax:-
barplot (H, xlab , ylab , main, names.arg, col)
Following is the description of the parameters used −
H is a vector or matrix containing numeric values used in bar chart.
xlab is the label for x axis.
ylab is the label for y axis.
main is the title of the bar chart.
names.arg is a vector of names appearing under each bar.
col is used to give colors to the bars in the graph.
Chapter-5
1. Explain data manipulation.
Ans:- Data manipulation involves modifying data to make it easier to read and to be more
organized.
We manipulate data for analysis and visualization.
It is also used with the term ‘data exploration’ which involves organizing data using
available sets of variables.
At times, the data collection process done by machines involves a lot of errors and
inaccuracies in reading.
Data manipulation is also used to remove these inaccuracies and make data more accurate
and precise.
2. How to summarized data in R?
Page 27 of 43
Ans:- The summarize() function is used in the R program to summarize the data frame into
just one value or vector.
This summarization is done through grouping observations by using categorical values at
first, using the groupby() function.
The dplyr package is used to get the summary of the dataset.
The summarize() function offers the summary that is based on the action done on grouped
or ungrouped data.
3. What is R Analytical Flow?
Ans:- R AnalyticFlow is a data analysis software that utilizes the R environment for statistical
computing.
In addition to intuitive user interface, it also provides advanced features for R experts.
These features enable you to share the processes of data analysis between users with
differing levels of proficiency.
4. Explain Qualitative and Quantitative Data.
Ans:- Quantitative data: The data collected on the grounds of the numerical variables are
quantitative data.
Quantitative data are more objective and conclusive in nature.
It measures the values and is expressed in numbers.
The data collection is based on “how much” is the quantity.
Qualitative data: The data collected on grounds of categorical variables are qualitative data.
Qualitative data are more descriptive and conceptual in nature.
It measures the data on basis of the type of data, collection, or category.
The data collection is based on what type of quality is given.
Qualitative data is categorized into different groups based on characteristics.
5. Write attributes of big data.
Ans:-Big data is a collection of data from many different sources and is often describe by
five characteristics: volume, value, variety, velocity, and veracity.
Volume: the size and amounts of big data that companies manage and analyze
Value: the most important “V” from the perspective of the business, the value of big data
usually comes from insight discovery and pattern recognition that lead to more effective
operations, stronger customer relationships and other clear and quantifiable business
benefits
Page 28 of 43
Variety: the diversity and range of different data types, including unstructured data, semi-
structured data and raw data
Velocity: the speed at which companies receive, store and manage data – e.g., the specific
number of social media posts or search queries received within a day, hour or other unit of
time
Veracity: the “truth” or accuracy of data and information assets, which often determines
executive-level confidence
6. Explain use of analysis report.
Ans:- The ability to analyze more data at a faster rate can provide big benefits to an
organization, allowing it to more efficiently use data to answer important questions.
Big data analytics is important because it lets organizations use colossal amounts of data in
multiple formats from multiple sources to identify opportunities and risks, helping
organizations move quickly and improve their bottom lines.
Some benefits of big data analytics include:
Cost savings. Helping organizations identify ways to do business more efficiently
Product development. Providing a better understanding of customer needs
Market insights. Tracking purchase behavior and market trends
7. Explain Scope of big data analysis using R.
Ans:- R is open-source software that can be downloaded from the R Cran website.
It is easy to learn and implement.
The R language is built specifically for performing statistical analysis, data manipulation, and
data mining using packages like plyr, dplyr, tidyr, and lubridate.
R supports data visualization with the help of packages such as ggplot2, googleVis, R color
brewer, leaflet, and ggmap.
The R software can also be used in a wide range of analytical modeling including classical
statistical tests, linear/non-linear modeling, data clustering, time-series analysis, and more.
8. What is Big Data?
Ans:- Big data primarily refers to data sets that are too large or complex to be dealt with by
traditional data-processing application software.
Data with many entries (rows) offer greater statistical power, while data with higher
complexity (more attributes or columns) may lead to a higher false discovery rate.
Page 29 of 43
Though used sometimes loosely partly due to a lack of formal definition, the best
interpretation is that it is a large body of information that cannot be comprehended when
used in small amounts only
9. Explain Analytical process flow.
Ans:- Analytical processing involves the interaction between analysts and collections of
aggregated data that may have been reformulated into alternate representational forms as
a means for improved analytical performance.
Step 1: Identify the problem. Step 2: Determine root causes.
Step 3: Explore alternatives. Step 4: Select an alternative.
Step 5: Implement the solution. Implementation involves the following: Developing an
action plan (what steps are needed). Determining objectives or measurable targets.
Identifying needed resources. Identifying details of the action plan (who will do what, by
when, where, and how, as applicable). Using the plan to put the solution in place.
Step 6: Evaluate the situatio
10. Explain Data Collection Methods.
Ans:- Data collection methods are techniques and procedures used to gather information
for research purposes.
These methods can range from simple self-reported surveys to more complex experiments
and can involve either quantitative or qualitative approaches to data gathering.
11. Explain Knitr with Html and word document
Ans:-
12. what is business Analytics?
Ans:- Business analytics (BA) is a set of disciplines and technologies for solving business
problems using data analysis, statistical models and other quantitative methods.
It involves an iterative, methodical exploration of an organization's data, with an emphasis
on statistical analysis, to drive decision-making.
Data-driven companies treat their data as a business asset and actively look for ways to
turn it into a competitive advantage.
Success with business analytics depends on data quality, skilled analysts who understand
the technologies and the business, and a commitment to using data to gain insights that
inform business decisions.
Page 30 of 43
Mark-5
Chapter-1
1. Explain R Data Types.
Ans:- Generally, while doing programming in any programming language, you need to use
various variables to store various information Variables are nothing but reserved memory
locations to store values This means that, when you create a variable you reserve some
space in memory.
You may like to store information of various data types like character, wide character,
integer, floating point, double floating point, Boolean etc Based on the data type of a
variable, the operating system allocates memory and decides what can be stored in the
reserved memory.
R has some typical atomic data types you already know about from other languages, but
also provides some more statistics inclined ones Let’s briefly go through them While
explaining these types, I’ll talk about assigning them Assigning in R is done with the “left
arrow” operator or as in
myString <<--"Hello
R is, however, very forgiving and will let you use the assignment operator in top level
environments like the console, if you don’t feel like typing out the arrow every time.
myString = "Hello
I suggest you get used to the arrow, though, you won’t get very far without it To check the
type (or class) of a variable, the class function can be used (though str from above does
almost the same thing):class( myString)
In contrast to other programming languages like C and java in R, the variables are not
declared as some data type The variables are assigned with R Objects and the data type of
the R object becomes the data type of the variable There are many types of R objects The
frequently used ones are
Vectors
Matrices
Arrays
Lists
Factors
Data Frames
The simplest of these objects is the vector object and there are six data types of these
atomic vectors, also termed as six classes of vectors.
Page 31 of 43
2. Explain Data Frames in detail.
Ans:- Data frame is a two dimensional data structure in R It is a special
case of a list which has each component of equal length Each component form the column
and contents of the component form the rows.
Creating Data Frames we can create a data frame using the data frame function.
For example, the above shown data frame can be created as follows
x <-data.frame ("SN" = 1:2, "Age" = c(21,15), "Name" =
c(" John","Dora”))
str (x) # structure of x
O/P
data.frame ': 2 obs. of 3
$ SN : int 1 2
$ Age : num 21 15
$ Name: Factor w/ 2 levels Dora","John ": 2 1
Notice above that the third column, Name is of type factor,instead of a character vector By
default, data frame function converts character vector into factor
To suppress this behavior we can pass the argument stringsAsFactors =FALSE
x data frame (" 1 2 ,," c( 21 15 ),),"
c("John",“Dora"," stringsAsFactors FALSE)
str (x) #now the third column is a character vector
O/P
data.frame ': 2 obs. of 3
$ SN : int 1 2
$ Age : num 21 15
$ Name: chr "John" "
Page 32 of 43
Chapter-2
1. Explain strsplit () function with example.
Ans:-Que.14 in chapter-2 mark-3
2. What is factor? Explain in detail.
Ans:- Factors are the data objects which are used to categorize the data and store it as
levels. They can store both strings and integers. They are useful in the columns which have
a limited number of unique values. Like "Male, "Female" and True, False etc. They are
useful in data analysis for statistical modelling.
Factors are created using the factor () function by taking a vector as input.
Example
# Create a vector as input.
data<-c("East","West","East","North","North","East","West","West","West","East","North")
print(data)
print(is.factor(data))
# Apply the factor function.
factor_data <- factor(data)
print(factor_data)
print(is.factor(factor_data))
4. Explain file path with getwd() and setwd() function.
Ans:- R has a working directory, which is the default place that files will be read from or
written to. You can see its location with getwd and change it with setwd:
getwd()
## [1] "d:/workspace/LearningR"
setwd("c:/windows")
getwd()
## [1] "c:/windows"
Notice that the directory components of each path are separated by forward slashes, even
though they are Windows pathnames. For portability, in R you can always specify paths
with forward slashes, and the file handling functions will magically replace them with
backslashes if the operating system needs them.
Page 33 of 43
You can also specify a double backslash to denote Windows paths, but forward slashes are
preferred:
"c:\\windows" #remember to double up the slashes
"\\\\myserver\\mydir" #UNC names need four slashes at the start
Alternatively, you can construct file paths from individual directory names using file.path.
This automatically puts forward slashes between directory names. It’s like a simpler, faster
version of paste for paths:
file.path("c:", "Program Files", "R", "R-devel")
## [1] "c:/Program Files/R/R-devel"
R.home() #same place: a shortcut to the R installation dir
## [1] "C:/PROGRA~1/R/R-devel“
Paths can be absolute (starting from a drive name or network share), or relative to the
current working directory. In the latter case, . can be used for the current directory and ..
can be used for the parent directory. ~ is shorthand for your user home directory.
path.expand converts relative paths to absolute paths:
path.expand(".")
## [1] "."
path.expand("..")
## [1] ".."
path.expand("~")
## "C:/Users/Staff/Documents"
6. Explain Factors and levels of Factors.
Ans:- Changing Factor Levels:-The order of the levels in a factor can be changed by applying
the factor function again with new order of the levels.
data<-c("East","West","East","North","North","East","West","West","West","East","North")
# Create the factors
factor_data <- factor(data)
print(factor_data)
# Apply the factor function with required order of the level.
new_order_data <- factor(factor_data,levels =
c("East","West","North"))
Page 34 of 43
print(new_order_data)
Page 35 of 43
Chapter-3
1. How to install Packages in R? Explain.
Ans:- Factory fresh installs of R are set up to access the CRAN package repository and CRAN
extra if you are running Windows.
CRAN extra contains a handful of packages that need special attention to build under
Windows, and cannot be hosted on the usual CRAN servers.
To access additional repositories, type setRepositories and select the repositories that you
want Figure shows the available options.
Bioconductor contains packages related to genomics and molecular biology, while Rforge
and RForge net mostly contain development versions of packages that eventually appear on
CRAN You can see information about all the packages that are available in the repositories
that you have set using available packages (be warned there are thousands, so this takes
several seconds to run):
View( available.packages())
As well as these repositories, there are many R packages in online repositories such as
GitHub Bitbucket and Google Code Retrieving packages from GitHub is particularly easy, as
discussed below
Many IDEs have a point and click method of installing packages In R GUI, the Packages
menu has the option “Install package(s) to install from a repository and “Install package(s)
from local zip files to install packages that you downloaded earlier Figure shows the R GUI
menu.
You can also install packages using the install packages function Calling it without any
arguments gives you the same GUI interface as if you’d clicked the “Install package( s) menu
option.
Usually, you would want to specify the names of the packages that you want to download
and the URL of the repository to retrieve them from A list of URLs for CRAN mirrors is
available on the main CRAN site
This command will (try to) download the time series analysis packages xts and zoo and all
the dependencies for each, and then install them into the default library location (the first
value returned by libPaths.
install.packages
c("xts ", "repos = "http://www.stats.bris.ac.uk/R/")
To install to a different location, you can pass the lib argument to install packages
install.packages
Page 36 of 43
c("xts ", "lib = "some/other/folder/to/install/to",
repos = "http://www.stats.bris.ac.uk/R/")
2. Explain date and time classes.
Ans:- There are three date and time classes that come with R POSIXct POSIXlt and Date.
POSIX Dates and Times
POSIX dates and times are classic R brilliantly thorough in their implementation, navigating
all sorts of obscure technical issues, but with awful Unixy names that make everything seem
more complicated than it really is.
The two standard date time classes in R are POSIXct and POSIXlt ..(I said the names were
awful!) POSIX is a set of standards that defines compliance with Unix, including how dates
and times should be specified.
ct is short for “calendar time,” and the POSIXct class stores dates as the number of seconds
since the start of 1970 in the Coordinated Universal Time (UTC) zone.
POSIXlt stores dates as a list, with components for seconds, minutes, hours, day of month,
etc POSIXct is best for storing dates and calculating with them, whereas POSIXlt is best for
extracting specific parts of a date
The function Sys time returns the current date and time in POSIXct form
now_ct Sys.time
## "2017 09 13 13:32:42 IST"
The class of now_ct has two elements It is a POSIXct variable, and POSIXct is inherited from
the class POSIXt
class(now_ct)
## [1] POSIXct " POSIXt
When a date is printed, you just see a formatted version of it, so it isn’t obvious how the
date is stored By using unclass we can see that it is indeed just a number:
unclass now_ct
## [1] 1505289763
When printed, the POSIXlt date looks exactly the same,but underneath the storage
mechanism is very different
now_lt as.POSIXlt now_ct
## [1] "2017
09 13 13:32:42 IST"
Page 37 of 43
class( now_lt)
## [1] "
POSIXlt " POSIXt
unclass (now_lt)
You can use list indexing to access individual components
of a POSIXlt date
now_lt$sec
## [1] 42.61241
now_lt [["min”]]
## [1] 32
3. Explain Time Zones with example.
Ans:- Time zones are horrible, complicated things from a programming perspective
Countries often have several, and change the boundaries when some (but not all) switch to
daylight savings time Many time zones have abbreviated names, but they often aren’t
unique For example,example,“ can refer to “Eastern Standard Time” in the United States,
Canada, or Australia.
You can specify a time zone when parsing a date string (with strptime and change it again
when you format it (with strftime During parsing, if you don’t specify a time zone (the
default is R will give the dates a default time zone This is the value returned by Sys
timezone which is in turn guessed from your operating system locale settings You can see
the OS date time settings with Sys getlocale ("LC_TIME”).
The easiest way to avoid the time zone mess is to always record and then analyze your
times in the UTC zone
If you can achieve this, congratulations! You are very lucky For
everyone else those who deal with other people’s data, for example
the easiest to read and most portable way of specifying time zones is to
use the Olson form, which is “Continent/ or similar
strftime now_ct , tz = " Los_Angeles
## [1] "2013
07 17 14:47:01"
strftime now_ct , tz = "Africa/
## [1] "2013
Page 38 of 43
07 17 22:47:01"
strftime now_ct , tz = "Asia/
## [1] "2013
07 18 03:17:01"
strftime now_ct , tz = "Australia/
## [1] "2013
07 18 07:17:01"
Chapter-4
1. Explain Bar Charts with example.
Ans:- A bar chart represents data in rectangular bars with length of the bar proportional to
the value of the variable R uses the function barplot to create bar charts R can draw both
vertical and horizontal bars in the bar chart In bar chart each of the bars can be given
different colors.
Syntax:-
barplot (H, xlab , ylab , main, names.arg, col)
Following is the description of the parameters used −
H is a vector or matrix containing numeric values used in bar chart.
xlab is the label for x axis.
ylab is the label for y axis.
main is the title of the bar chart.
names.arg is a vector of names appearing under each bar.
col is used to give colors to the bars in the graph.
Example
A simple bar chart is created using just the input vector and the name of each bar.
The below script will create and save the bar chart in the current R working directory.
# Create the data for the chart.
H <-c(7,12,28,3,41)
# Plot the bar
barplot(H)
Page 39 of 43
2. Explain Histogram.
Ans:- A histogram represents the frequencies of values of a variable bucketed into ranges
Histogram is similar to bar chat but the difference is it groups the values into continuous
ranges Each bar in histogram represents the height of the number of values present in that
range.
R creates histogram using hist function This function takes a vector as an input and uses
some more parameters to plot histograms.
Syntax:-
The basic syntax for creating a histogram using R is −
hist (v, main, xlab , xlim , ylim , breaks, col ,border)
Following is the description of the parameters used −
Page 40 of 43
4. How to work with Datebase in R?
Ans:- The data is Relational database systems are stored in a normalized format So, to carry
out statistical computing we will need very advanced and complex Sql queries But R can
connect easily to many relational databases like MySql Oracle, Sql server etc and fetch
records from them as a data frame
Once the data is available in the R environment, it becomes a normal R data set and can be
manipulated or analyzed using all the powerful packages and functions
In this tutorial we will be using MySql as our reference database for connecting to R
RMySQL Package
R has a built in package named RMySQL which provides native connectivity between with
MySql database You can install this package in the R environment using the following
command.
install.packages RMySQL
install.packages dbConnect
library(dbConnect)
Connecting R to MySql
Once the package is installed we create a connection object in R to connect to the database
It takes the username, password, database name and host name as input.
# Create a connection Object to
MySQL database.
# We will connect to the sample database named “example" that comes
with MySql installation.
mysqlconnection
= dbConnect MySQL (), user = 'root', password =
'', dbname = ‘example', host = localhost
#List the tables available in this database.
dbListTables
mysqlconnection
5. Explain common graphics parameter.
Ans:-
Page 41 of 43
Chapter-5
1. Explain Big Data Analytics using R.
Ans:- R is an open source language which is used for data modeling manipulation, statistics,
forecasting, time series analysis and visualization of data R language uses the RAM of your
machine, so bigger the RAM of your machine the bigger data you can hold for R to work
upon.
We have more than 4000 different packages developed by various scholars to be used as
per requirements.
Initially R was not used as Big Data Analysis language due to its memory limitations
problems Gradually R got some libraries like ff, ffbase Rodbc rmr 2 and Rhdfs to handle big
data Rmr 2 and rhdfs together use the power of Hadoop in order to handle big data
effectively
2. Explain Report Generation in R.
Ans:- At the top of any R Markdown script is always the YAML header section enclosed by --
- and is the minimum code chunk that should be put there.
By default this includes a title, author, date, and the file type you want as output. Whatever
we set in the header section is applied to the whole document.
This is an example of a YAML header at the top of an .Rmd script which creates an html
document.
---
title : "R Markdown Example"
author: Salet Jyotsna
date : 24/09/2022
output : html_document
3. Explain Business Analytics Life Cycle steps.
Ans:-
4. Generate simple presentation using R.
Ans:-
5. Explain data collection method.
Ans:- Data collection procedures are an important step It is important to keep in mind both
what our research question is about and how we will analyze the data we collect However,
before gathering information we need to identify the source of data and, based on that
Page 42 of 43
knowledge, decide the methodology we will employ to collect the data Researchers have
historically used four types of data collection. - sources:
1 data distributed by an organization or individual, 2 an experiment that they designed,
3 surveys, or 4 observation 5 data
6. Explain Roll of analysis in R.
Ans:-
Page 43 of 43