BDA Chapter6
BDA Chapter6
This programming language was named R, based on the first letter of first name of the
two R authors (Robert Gentleman and Ross Ihaka)
It generally comes with the command-line interface and provides a vast list of packages
for performing tasks.
R is an interpreted language that supports both procedural programming
and
object-oriented programming.
A.I. Kalsekar Technical Campus, New Panvel
Why R Programming Language?
R quickly gained popularity among statisticians, data analysts, and researchers due to its flexibility,
extensibility, and powerful statistical capabilities.
The R language provides a wide range of statistical and graphical techniques, including linear and
nonlinear modeling, time series analysis, clustering, and more.
The first project was considered in 1992. The initial version was released in 1995,and in 2000, a stable
beta version was released.
1995 Martin Machler convinces Ross and Robert to use the GNU General public License to make R a free s\w.
1997 The R core Group is formed. The core group controls the source code for R.
2023
VeAr.Is.
Features of R Programming Language.
No need for a compiler: The R language is interpreted. It does not need a compiler to convert the code into
a program.
Cross- Platform support: R is cross-platform supportive that is it can run on any OS and in any
Software environment without any hassle.
Performs fast calculations: You can perform wide variety of complex operations on
vectors, arrays, data frames and other data objects of varying sizes.
• To run the program use the following command on the command line:
R file_name.r
As R programming language is an open source. Thus, you can run R anywhere and at any
time.
In R, everyone is welcome to provide new packages, bug fixes, and code enhancements.
IDE is a GUI, where you can write your quotes, see the results and also see the
variables that are generated during the course of programming.
R Studio is also available for various platforms such as Windows, Linux, and macOS.
• "hello world“
• 100+200
• a <- 60
• b <-68
• c =a+b
•C
• a<b
• a>b
var_name% Invalid Has the character '%'. Only dot(.) and underscore allowed.
2var_name invalid Starts with a number
.var_name, valid Can start with a dot(.) but the dot(.)should not be followed
var.name by a number.
v = c(0, 10, 10, 10, 20, 30, 40, 40, 40, 50, 60)
print("Original vector-1:")
print(v)
rv = rev(v)
print("The said vector in reverse order:")
print(rv)
8. Write a R program to
concatenate a vector.
a = c("Python","NumPy", "Pandas")
print(a)
x = paste(a, collapse = "")
print("Concatenation of the said string:")
print(x)
• The help() function and ? help operator in R provide access to the documentation
pages for R functions, data sets, and other objects, both for packages in the
standard R distribution and for contributed packages.
• To access documentation for the standard lm (linear model) function, for example,
enter the command help(lm) or help("lm"), or ?lm or ?"lm" (i.e., the quotes are
optional).
• help()
• help(lm)
4. floor(x) It returns the largest integer, which is smaller x<- 2.5 print(floor(x))
than or equal to x. Output[1] 2
2. var() It computes the sample variance of a given vector. x <- c(5, 7, 9, 12, 15)
varn <- var(x)
print(varn)
Output: 15.8
4. median() It computes the sample median of a given numeric df <- c(1, 2, 7, 12, 15)
vector. A.I. Kalsekar Technical Campus, New Panvel
med_value <- median(df)
S. No Function Description Example
5. sd() It computes the standard deviation of a given set of df <- c(1, 3, 5, 12, 20)
values. std_dev <-
sd(df)
print(std_dev)
Output: 7.79102
6. range() It returns a vector with two elements representing a df <- c(1, 3, 4, 5, 9, 10)
given dataset's minimum and maximum values. rang <- range(df)
print(rang)
Output: 1 10
7. diff() It computes the lagged differences between x <- c(4, 8, 12, 16, 20)
consecutive elements in a given vector. dif <- diff(x)
print(dif)
Output: 4 4
44
3. rev() It returns the reverse version of data objects. x <- c(39, 40, 41, 42, 43, 44, 45)
rev_x <- rev(x)
print(rev_x)
Output: 45 44 43 42 41 40 39
4. length() It determines the length or the number x <- c(1, 2, 3, 4, 5, 12, 15, 18)
of elements in a A.I.
vector orTechnical
Kalsekar an object.
Campus, New Panvel
x_length <- length(x)
Data Visualization in R
• Data visualization is the technique used to deliver insights in data using visual cues
such as graphs, charts, maps, and many others.
• This is useful as it helps in easy understanding of the large quantities of data and
thereby make better decisions regarding it.
• Data Visualization in R Programming Language
The popular data visualization tools that are available are Tableau, Plotly, R, Google
Charts, Infogram, and Kibana. The various data visualization platforms have
different capabilities, functionality, and use cases.
R is a language that is designed for statistical computing, graphical data analysis,
and scientific research.
It is usually preferred for data visualization as it offers flexibility and minimum
required coding through its packages.
1. Bar Plot
There are two types of bar plots- horizontal and vertical which represent data
points as horizontal or vertical bars of certain lengths proportional to the value of
the data item.
They are generally used for continuous and categorical variable plotting.
By setting the horiz parameter to true and false, we can get horizontal and vertical
bar plots respectively.
•To perform a comparative study between the various data categories in the data
set.
In a Histogram, continuous values are grouped and displayed in these bins whose size
can be varied.
boxplot(airquality$Wind,
main = "Average wind speed\at La Guardia
Airport",
xlab = "Miles per hour", ylab = "Wind",
col = "orange", border = "brown",
horizontal = TRUE, notch = TRUE)
• #pressure dataset
• pressure
• #airquality dataset
• airquality
• plot(mtcars$disp,mtcars$mpg,xlab="disp",ylab="mpg",main="disp qnd
mpg",pch=21,col="red3",bg="slateblue3",lwd=5)
• barplot(mtcars$cyl)