0% found this document useful (0 votes)
15 views41 pages

DS R Unit-1

Ds unit 1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views41 pages

DS R Unit-1

Ds unit 1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

INTRODUCTION TO DATA SCIENCE AND

R PROGRAMMING

UNIT - I

Syllabus
Defining Data Science and Big data, Benefits and Uses, facets of Data, Data Science
Process. Historyand Overview of R, Getting Started with R, R Nuts and Bolts

Data:
Data is collection of information gathered by observation, research,
analysis, Etc.
Data Science:
Data science is an Advanced statistical computing.
Data science is the study of data. That means Data science is a field of applied
mathematics and statistics that provides useful information based on large amount
of complex data or big data. This field uses scientific methods, processes,
algorithms, and patterns to extract knowledge and insights from big data.
Data science uses the most powerful hardware, programming systems, and
most efficient algorithms to solve the data related problems. Data science focuses
on past data, present data, and also future predictions.

Big Data:
Big Data is a collection of data that is huge in volume. That means it is a data
with large size. Big data is a combination of structured, semi structured and
unstructured data collected by organizations that can be used in machine
learning projects, predictive modeling and other advanced analytics applications.
Companies use big data in their systems to improve operations and provide better
customer service. Big data is also used by medical researchers to identify disease
and medical conditions of patients. Financial services use big data systems for risk
management and real-time analysis of market data.

BENEFITS AND USES


Benefits:
1.Making better business decisions
Companies can use data and risk analysis to make better business
decisions.

2. Increased Efficiency
Business operations can be made more efficient and costs can be cut
with the use of data science.

3.Developing better products


Data science and Big data can be used to create better products with the
help user behavior.

4. Improved Customer Experience


Data Science can help organizations better understand their customers
by analyzing customer data. The customer experience can be improved by
using this information to create goods and services to the customers.

5.Predicting outcomes and trends


Based on past data, data science can be used to forecast future
results. Businesses can find trends and forecast future occurrences by
using data science Algorithms.

6. Better Fraud Detection


Data Science can be used to detect fraud by analyzing large amounts of
data to identify anomalies and suspicious activities. This can help
organizations to minimize financial losses and protect their customers.

7. Improved Healthcare Outcomes


Data Science can be used in healthcare to analyze patient data and
improve patient outcomes.

8. Improved Public Services


Data Science can be used by government organizations to improve
public services. For example, it can be used to analyze crime data ,traffic
flow etc.
Uses:

1.Search Engines
The most useful application of Data Science is Search Engines. We want
to search for something on the internet, we mostly used Search engines
like Google, Safari, Firefox, etc.

2. Transport
Data Science also entered into the Transport field. It can optimize
shipping routes in real-time.

3. Finance
Data Science plays a key role in Financial Industries. Data Science is
widely used in the banking and finance sectors for fraud detection.

4. E-Commerce
E-Commerce Websites like Amazon, Flipkart, etc. uses data Science to
make a better user experience.

5. Health Care
Data science can identify and predict disease, and personalize
healthcare recommendations.

6. Image Recognition
Data Science is also used in Image Recognition. For Example, When
we upload our image with our friend on Facebook, Facebook gives
suggestions Tagging who is in the picture. This is done with the help of
machine learning and Data Science.

7. Data Science in Gaming


One of the most exciting applications of data science in gaming. Data science
is used in the game development process. Data science can improve online gaming
experiences.
8. Autocomplete
AutoComplete feature is an important part of Data Science where the
user will get the facility to just type a few letters or words, and he will get
the feature of auto-completing the line.

Facets of data
Very large amount of data will generate in big data and data science. These
data is various types and main categories of data are as follows:

1. Structured
2. Unstructured
3. Natural language
4. Machine-generated
5. Graph-based
6. Audio, video, and images
7. Streaming
1.Structured data
Structured data is arranged in rows and column format. It helps for application to
retrieve and process data easily. Database management system is used for storing
structured data. An Excel table is an example of structured data.

2.Unstructured data
Unstructured data is data that does not follow a specified format. Row and
columns are not used for unstructured data. Therefore it is difficult to retrieve
required information. The unstructured data can be in the form of Text. Email is an
example of unstructured data.

3.Natural language
Natural language is a special type of unstructured data. Natural language
processing enables machines to recognize characters, words and sentences. This
helps machines to understand language like humans.
4.Machine-generated data
Machine-generated data is an information that is automatically created by a
computer, process, application,etc. without human interaction. Examples of
machine data are web server logs, call detail records, network event logs, etc.

5.Graph-based or network data


Graphs are data structures to describe relationships and interactions between
entities in complex systems. In general, a graph contains a collection of entities
called nodes and another collection of interactions between a pair of nodes called
edges. A graph database stores nodes and relationships instead of tables or
documents. Graph-based data can be found on many social media websites.

6.Audio, image, and video


Data Science is playing an important role in multimedia data. Multimedia data
usually contains various forms of media, such as Audio, image, and video which
come from multiple sources.

7.Streaming data
Streaming data, also known as real-time data, event data, stream data
processing, or data-in-motion, refers to a continuous flow of information
generated by various sources.

The data science process


The data science process is a systematic approach to solving data problems. The data science
process typically consists of six steps.

1.Setting the research goal


Data science is mostly applied in the context of an organization. When the business asks you to
perform a data science project, you will first prepare a project charter. This charter contains information
such as what you are going to research, how the company benefits from that, what data and resources you
need, a timetable, and deliverables.

2. Retrieving data
The second step is to collect data. You have stated in the project charter which data you need and
where you can find it. In this step you ensure that you can use the data in your program, which means
checking the existence of, quality, and access to the data.

3. Data preparation
Data collection is an error-prone process, in this phase you enhance the quality of the data and prepare
it for use in subsequent steps. This phase consists of three sub phases: data cleansing, data integration and
data transformation.

4. Data exploration
Data exploration is concerned with building a deeper understanding of your data. You try to
understand how variables interact with each other, the distribution of the data, and whether there are
outliers. To achieve this you mainly use descriptive statistics, visual techniques, and simple modeling.

5. Data modeling (or) model building


In this phase you use models, domain knowledge, and insights about the data you found in the
previous steps to answer the research question. Building a model is an iterative process that involves
selecting the variables for the model, executing the model, and model diagnostics.

6. Presentation and automation


Finally, you present the results to your business. These results can take many forms, ranging from
presentations to research reports. Sometimes you’ll need to automate the execution of the process because
the business will want to use the insights you gained in another project or enable an operational process to
use the outcome from your model.

History and Overview of R


R is a programming language and software environment specially designed for statistical
computing , data analysis, graphics representation and report Generating. R was created by Ross
Ihaka and Robert Gentleman at the University of Auckland in New Zealand. R made its first
appearance in 1993. The initial version was released in 1995.
R programming language is implemented from the S programming language. S is a
statistical programming language. It was developed by John Chambers, Rick Becker and others
at the Bell Labs around 1976.
R is an interpreted computer language which allows branching and looping as well as
modular programming using functions. R allows integration with the procedures written in the C,
C++, Python ,etc. languages for efficiency. R is freely available under the GNU General public
License versions are provided for various operating systems like Linux, Windows and Mac.
R version 1.0.0 was released to the public in February 2000.
Latest Version: R version 4.3.3 was released in February 2024.
Features of R
The following are the important features of R
1. R is Open source software , that means it is free of cost.

2. R can be used to perform simple and complex mathematical and


statistical calculations on large data sets.

3. R is an interpreted Programming language which allows


branching and looping as well as modular programming
using functions.

4. R is machine independent .It supports the cross-platform


operations, that means it can be used on many different
operating systems.

5. R supports distributed computing.

6. R is Used to interfacing with databases.

7. R is used for machine learning.

8. R has an effective data handling and storage facility.

9. R Supports High quality Graphics.

10. R plays crucial role in the field of data science. Its extensive
set of packages and Libraries make it well suited for data
Analysis.

R Get Started
R – Environment Setup for Windows
You can download the latest Windows installer version of R from
CRAN(Comprehensive R Archive Network).CRAN is a network of web servers
around the world that store up-to-date versions and documentation of R.
Installing R on Windows OS
To install R on Windows OS:
1. Go to the CRAN website.
2. Click on "Download R for Windows".
3. Click on "install R for the first time" link to download the R executable (.exe) file.
4. Run the R executable file to start installation.
5. Select the installation language.

6. Follow the installation instructions.


7. Click on "Finish" to exit the installation setup.

R is now successfully installed on your Windows OS. Open the R GUI to start
writing R codes.
The screenshot below shows R console on a Windows PC.

It is a command interpreter. There is also an integrated


development environment(IDE) available for R that is built by
RStudio.

RStudio
RStudio is freely available open-source Integrated Development Environment
(IDE). RStudio provides an environment with many features to make R easier.
RStudio is a Graphical user interface, not just a command prompt.
Installing RStudio Desktop
To install RStudio Desktop on your computer, do the following:
1. Go to the RStudio website.
2. Download RStudio Desktop recommended for your computer.
3. Run the RStudio Executable file (.exe) for Windows.

4. Follow the installation instructions to complete RStudio Desktop


installation.

RStudio is now successfully installed on your computer. The RStudio Desktop IDE interface is
shown in the figure below:
RStudio Interface
The RStudio interface has four main panels:
Console: You can type commands and see output.

Script editor: you can type commands and save to a file.


You can also run group of commands at the
console from here.

Environment/History: environment shows all active objects and


history keeps track of all commands run in
console

Files/Plots/Packages/Help : It shows Files/Plots/Packages/Help


R Nuts and Bolts
Console and Editor Panels
There are two main window types used for programming R code and Display output.
The console or command-line interpreter is used to perform directly for calculations. By
default, the R prompt that indicates > symbol.

you can set R prompt as follows:


> options(prompt="R> ")
R>

Comments
Non-executable lines in R script and R Console are called as comments.
Comments lines are for documentation purposes and these lines are ignored by
the interpreter.

Note:
Unlike some other languages, R does not support multi-line
comments.
In R, you can your code with comments. Just preface the line with a hash
mark (#).
eg:

# - It is a comment in R

R> 1+1 # This works out the result of one plus one!
[1] 2

Print output using print() function


Using print() function to print output in R.
Syntax:
print(“any string”) (or) print(variable)

Example:
# print string
> print("Welcome to R Programming")
[1] "Welcome to R Programming"

# print variable
> x <- 100
>print(x)
[1] 100

Note:
Above outputs display [1], It indicates first Element of the output vector.
R Script File
Usually, you will do your programming by writing your programs in script files
and then execute those scripts
help of R interpreter. So let's start with writing following code in a script file called
test.R as below

# My First program in R Programming


mystring <- "Hello world !"
print(mystring)

Executing Code
RStudio supports the direct execution of code from source editor(script exitor).

Executing a Single Line


To execute the single line of source code where the cursor currently resides you
press the Ctrl+Enter key or use the Run toolbar button.

After executing the line of code, RStudio automatically moves the cursor to the
nextline. The output displayed in
the Rstudio console.

Executing Multiple Lines


There are three ways to execute multiple lines from script editor:
1. Select the lines and press the Ctrl+Enter key (or) use
the Run toolbar button.

2. Press Ctrl+Shift+P key (or) Re-Run Previous code


Region toolbar button to run the same selection again.

3.To run the entire document press the Ctrl+Shift+S


key or use the Source toolbar button.

Keyboard Shortcuts
Ctrl+Shift+N — New document
Ctrl+O — Open document
Ctrl+S — Save active document
Ctrl+1 — Move to the Source Editor
Ctrl+2 — Move to the Console
Ctrl+L - clear console

R - Data Types
Data types refer to format of storing the data in the program. Generally, while
doing programming in any programming language, you need to use various
variables to store information. Variables are nothing but reserved memory
locations to store values.

Data Types in R Programming Language:


Basic Data Types Values
Numeric Set of all real numbers
Integer Set of all integers
Logical TRUE and FALSE
Complex Set of complex numbers
Character A single character or group of
characters
Numeric Datatype
Decimal values are called numerics in R.
If you assign a decimal value to a variable x as follows, x will be of numeric type.
x = 5.6
y=10

Integer Datatype
R supports integer data types which are the set of all integers. You can use the
capital ‘L’ notation as a suffix to denote that a particular value is an integer
datatype.
eg:
10L,20L,etc.

Logical Datatype
R has logical data types that take either a value of TRUE or FALSE.
eg:
x=TRUE
y=FALSE

Complex Datatype
R supports complex data types that are set of all the complex numbers. A
complex number has a real and an imaginary component. For example, 2+3i is a
complex number, where 3i is the imaginary Component and is equal to √-9 (√9
× √−1 = 3i)
Character Datatype
R supports character data types where you have all the alphabets and special
characters. It stores character values or strings. A string can be defined either
Single quotes (or) double quotes.
eg:
ch = 'R'
st="Welcome to R Programming"
typeof()
It is a Function and is used to Find datatype of different values in R
Programming.
Syntax: typeof(x)
Parameters: x: specified data
eg: data_test.R
typeof(100)
typeof(12.8)
typeof('R')
typeof("Welcome to R Programming")
typeof(4 + 3i)
Output:
[1] "double"
[1] "double"
[1] "character"
[1] "character"
[1] "complex"

How to Print Multiple values/Variables on the Same Line


in R
cat()
It is a function and is used to print multiple values/variables on
the same line in R.
syntax:
cat(value/variable1, value/variable2,...)
> n=10
> cat("Value of n =",n)
Value of n = 10
Taking Input from User in R Programming
In R Programming there are two methods are used for taking
input from the user.
1. Using readline() method
2. Using scan() method
readline()
In R language readline() method takes input in string format.
Syntax:
var = readline([prompt = “string “])
eg1:
var = readline();
print(var)
eg2:
var = readline("Enter data :");
cat("Given data :",var)
as.functions in R
To convert the input value to the desired data type, there are
some as.functions in R
eg:
as.integer(n); —> convert to integer
as.numeric(n); —> convert to numeric type(float,double etc)
as.complex(n); —> convert to complex number
var = readline(prompt = "Enter any number : ");
script
var = readline(prompt = "Enter any number : ");
var = as.integer(var);
#print(var)
cat("Given Number :",var)

scan()
scan() method is taking input continuously, to terminate the
input process, need to press Enter key 2 times on the console.
syntax:
x=scan()

eg:
scan.R
print("Enter values press Enter key 2 times to stop1")
x = scan()
print(x)
output1:
[1] "Enter values press Enter key 2 times to stop"
1: 1 2 3
4: 4 5 6
7: 7 8 9
10:
Read 9 items
[1] 1 2 3 4 5 6 7 8 9

output2:
[1] "Enter values press Enter key 2 times to stop"
1: 1
2: 2
3: 3
4:
Read 3 items
[1] 1 2 3
Comments
Unexecutable lines in a R script, Unexecutable line in a R Console are called as
comments. Comments lines are for documentation purpose and these lines are
ignored by the interpreter.
Note:
Unlike other languages, R does not support multi-line comments or comment blocks.

In R, you can your code with comments. Just preface the line with a hash mark (#),
eg:

R> # This is a comment in R...

R> 1+1 # This works out the result of one plus one!
[1] 2

Viewing your working directory


getwd()
It is a function and is used to display working directory
R> getwd()
[1] "Users/desktop/MyProject"

Change the default working directory


setwd()
It is a function and is used to change default directory.
R> setwd("/desktop/MyProject1")

Assign values to variables


The <- symbol is the assignment operator in R language.
> x <- 1
>x
[1] 1
> msg <- "Hello"
> msg
[1] "Hello"
Note: Here auto printing Occurs
Print output using print() function
Using print() function to print output is the most common
method in R. Implementation of this method is very simple.

Syntax: print(“any string”) or, print(variable)

Example:
# print string
> print("Welcome to R Programming")
[1] "Welcome to R Programming"

# print variable
x <- 100
>print(x)
[1] 100

Note: Here Explicit printing

Above outputs display [1], It indicates first Element of the output


vector.

R Script File
Usually, you will do your programming by writing your programs
in script files and then you execute those scripts at your
command prompt with the help of R interpreter called Rscript.
So let's start with writing following code in a text file called test.R
as below

# My First program in R Programming


mystring <- "Hello world !"a
print(mystring)

Executing Code
RStudio supports the direct execution of code from within the
source editor.
Executing a Single Line
To execute the line of source code where the cursor currently
resides you press the Ctrl+Enter key or use the Run toolbar
button.
After executing the line of code, RStudio automatically moves the
cursor to the nextline.
Executing Multiple Lines
There are three ways to execute multiple lines from within the
editor:
1. Select the lines and press the Ctrl+Enter key (or) use
the Run toolbar button.

2. Press Ctrl+Shift+P key (or) Re-Run Previous code


Region toolbar button to run the same selection again.

3.To run the entire document press the Ctrl+Shift+Enter


key or use the Source toolbar button.

Keyboard Shortcuts
Ctrl+Shift+N — New document
Ctrl+O — Open document
Ctrl+S — Save active document
Ctrl+1 — Move focus to the Source Editor
Ctrl+2 — Move focus to the Console
Ctrl+L - clear console

R - Data Types
Generally, while doing programming in any programming
language, you need to use various variables to store various
information. Variables are nothing but reserved memory locations
to store values. This means that, when you create a variable you
reserve some space in memory.
You may like to store information of various data types like
character, integer, floating point, Boolean etc. Based on the
data type of a variable, the operating system allocates memory
and decides what can be stored in the reserved memory.

Data Types in R Programming Language:


Basic Data Types Values
Numeric Set of all real numbers
Integer Set of all integers
Logical TRUE and FALSE
Complex Set of complex numbers
Character A singler character or
group of characters
Numeric Datatype
Decimal values are called numerics in R. It is the default data
type for numbers in R.

If you assign a decimal value to a variable x as follows, x will be


of numeric type.
x = 5.6
Note:
In R language the variables are not declared as some data
type.But in C and java the variables are declared as some data
type.

Integer Datatype
R supports integer data types which are the set of all integers.
You can use the capital ‘L’ notation as a suffix to denote that a
particular value is an integer datatype.
eg:
10L,20L,etc.

Logical Datatype
R has logical data types that take either a value of TRUE or
FALSE.

eg:

x=TRUE

y=FALSE

Complex Datatype
R supports complex data types that are set of all the complex
numbers. A complex number has a real and an imaginary
component. For example, 2+3i is a complex number, where 3i is
the imaginary Component and is equal to √-9 (√9 × √−1 = 3i)

Character Datatype
R supports character data types where you have all the
alphabets and special characters. It stores character values or
strings. Strings in R can contain alphabets, numbers, and
symbols. The easiest way to denote that a value is of character
type in R is to wrap the value inside single or double
quotes(inverted commas).

eg:

ch = 'R'
st="Welcome to R Programming"

typeof()

It is a Function and is used to Get type of different data types


in R Programming.

Syntax: typeof(x)

Parameters: x: specified data

eg: data_test.R

typeof(100)

typeof(12.8)

typeof('R')

typeof("Welcome to R Programming")

typeof(4 + 3i)

Output:

[1] "double"

[1] "double"

[1] "character"

[1] "character"

[1] "complex"

How to Print Multiple values/Variables on the Same Line


in R
cat()

It is a function and is used to print multiple values/variables on


the same line in R.

syntax:

cat(value/variable1, value/variable2,...)

> n=10

> cat("Value of n =",n)

Value of n = 10

Taking Input from User in R Programming

In R Programming there are two methods are used for taking


input from the user.

1. Using readline() method

2. Using scan() method

readline()

In R language readline() method takes input in string format.

Syntax:

var = readline([prompt = “string “])

eg1:

var = readline();

print(var)

eg2:
var = readline("Enter data :");

cat("Given data :",var)

as.functions in R

To convert the input value to the desired data type, there are
some functions in R

eg:

as.integer(n); —> convert to integer

as.numeric(n); —> convert to numeric type (float, double etc)

as.complex(n); —> convert to complex number

var = readline(prompt = "Enter any number : ");

script

var = readline(prompt = "Enter any number : ");

var = as.integer(var);

#print(var)

cat("Given Number :",var)

scan()
scan() method is taking input continuously, to terminate the
input process, need to press Enter key 2 times on the console.
syntax:
x=scan()

eg:
scan.R
print("Enter values press Enter key 2 times to stop1")
x = scan()
print(x)
output1:

[1] "Enter values press Enter key 2 times to stop"


1: 1 2 3
4: 4 5 6
7: 7 8 9
10:
Read 9 items
[1] 1 2 3 4 5 6 7 8 9

output2:
[1] "Enter values press Enter key 2 times to stop"
1: 1
2: 2
3: 3
4:
Read 3 items
[1] 1 2 3

R – Objects
There are 6 basic types of objects in the R language
1. Vectors
2. Lists
3. Matrices
4. Arrays
5. Factors
6. Data Frames

Vectors
When you want to create vector with more than one element,
you should use c() function which means to combine the
elements into a vector.
Create a vector
syntax:
var <- c(element-1,element-2,.....,element-n)

eg:
apple <- c('red','green',"yellow")

script
apple <- c('red','green',"yellow")
print(apple)

output:s
[1] "red" "green" "yellow"

Lists
Lists are the R objects which contain elements of
different types like − numbers, strings, vectors and
another list inside it. List is created using list() function.
Create a list containing strings, numbers, vectors and a
logical values.

script(lst1.R)

list_data <- list("WELCOME", c(100,200),TRUE,51.23)


print(list_data)

o/p:
[[1]]
[1] "WELCOME"

[[2]]
[1] 100 200

[[3]]
[1] TRUE

[[4]]
[1] 51.23

script(lst1.R)
vec <- c(3,4,5,6)
char_vec<-c("C","JAVA","PYTHON","R")
logic_vec<-c(TRUE,FALSE,FALSE,TRUE)
out_list<-list(vec,char_vec,logic_vec)
print(out_list)

Output:
[[1]]
[1] 3 4 5 6

[[2]]
[1] "C" "JAVA" "PYTHON" "R"

[[3]]
[1] TRUE FALSE FALSE TRUE
Matrices
Matrices are the R objects in which the elements are
arranged in a two-dimensional rectangular layout. They
contain elements of the same atomic types. We use
matrices containing numeric elements to be used in
mathematical calculations.

A Matrix is created using the matrix() function.

Syntax
The basic syntax for creating a matrix in R is−

matrix(data, nrow, ncol, byrow, dimnames)

Following is the description of the parameters used

data is the input vector which becomes the data elements


of the matrix.

nrow is the number of rows to be created.

ncol is the number of columns to be created.

byrow is a logical value. If TRUE then the input vector


elements are arranged by row.

dimname is the names assigned to the rows and columns.

>
m <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2)
>m
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6

Create a matrix taking a vector of numbers as input.

# Define the column and row names.


rownames = c("row1", "row2", "row3")
colnames = c("col1", "col2", "col3")

m <- matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3,


byrow = TRUE, dimnames = list(rownames,

colnames))
print(m)

o/p:
col1 col2 col3
row1 3 4 5
row2 6 7 8
row3 9 10 11

# R program to create a matrix

A = matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9),nrow = 3,
ncol = 3, byrow = TRUE )

# Naming rows
rownames(A) = c("A", "B", "C")

# Naming columns
colnames(A) = c("X", "Y", "Z")
print(A)
Output:

X Y Z
A 1 2 3
B 4 5 6
C 7 8 9

Diagonal matrix:
A diagonal matrix is a matrix in which the entries
outside the main diagonal are all zero.

Syntax: diag(data, m, n)
Parameters:
Data: Diagonal elements
m: no of rows
n: no of columns

Example:
dm <- diag(c(5, 3, 3), 3, 3)
print(dm)

Output:

[,1] [,2] [,3]


[1,] 5 0 0
[2,] 0 3 0
[3,] 0 0 3

Arrays
Arrays are essential data storage structures defined by a fixed number
of dimensions. Arrays are used for the allocation of space at contiguous
memory locations. Uni-dimensional arrays are called vectors .
Two-dimensional arrays are called matrices. Arrays consist of all elements
of the same data type.

Creating an Array
The array() function is used to create an array
Syntax:
array(data, dim = (nrow, ncol, nmat),
dimnames=names)
where,

nrow : Number of rows


ncol : Number of columns
nmat : Number of matrices
dimnames : names assigned to the rows and
columns.

Example
The following example creates an array with 3 rows and 3 columns.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
result <- array(c(vector1,vector2),dim = c(3,3))
print(result)

output
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15

Naming Columns and Rows


We can give names to the rows, columns and matrices in the array by
using the dim names parameter.

# Create two vectors of different lengths.


vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
result <- array(c(vector1,vector2),dim = c(3,3),dimnames =
list(row.names,column.names))
print(result)
output

COL1 COL2 COL3


ROW1 5 10 13
ROW2 9 11 14
ROW3 3 12 15

R - Factors
Factors are the data objects which are used to categorize the data
and store it as levels. R treats the text column as categorical data and
creates factors on it.
Creating a Factor in R
Factors are created using the factor () function by taking a vector as
input.
Syntax:
factor(data,[levels])
fc1.R
data <- c("East","West","East","North","South","East",
"West","South","West","East","North")
factor_data <- factor(data)
print(factor_data)

output:
[1] East West East North South East West South West East North
Levels: East North South West

note: By default levels are sorted(alphabetical) order.

Changing the Order of Levels


fc2.R
data <- c("East","West","East","North","South","East",

"West","South","West","East","North")
factor_data <- factor(data)
print(factor_data)
new_order_data <- factor(data,levels=c"East","West","North","South"))
print(new_order_data)

output:
[1] East West East North South East West South West East North
Levels: East North South West
[1] East West East North South East West South West East North
Levels: East West North South

Changing the Order of Levels


fc2.R
data <-
c("East","West","East","North","North","East","West","West","West","East",
"North")
factor_data <- factor(data)
print(factor_data)
new_order_data <- factor(factor_data,levels = c("East","West","North"))
print(new_order_data)

output:
[1] East West East North North East West West West East North
Levels: East North West
[1] East West East North North East West West West East
North
Levels: East West North

R Data Frames

Data Frames are data displayed in a format as a table.


Data Frames can have different types of data inside it. However, each
column should have the same type of data.

creating a dataframe
The data.frame() function is used to create a data frame
syntax:
data.frame(vector-2,vector-2,...,vector-n,StringsAsFactors=Logical)
df1.R
emp_id = c (100,200,300,400,500)
emp_name = c("Hari","Ravi","Raju","Gopi","Vasu")
salary = c(30000.00,50000.00,20000.00,25000.00,15000.00)
emp_data <- data.frame(emp_id,emp_name,salary)
print(emp_data)

Output
emp_id emp_name salary
1 100 Hari 30000
2 200 Ravi 50000
3 300 Raju 20000
4 400 Gopi 25000
5 500 Vasu 15000

is. functions in R
is.functions are used for objects of specified type. These
functions are return logical value(TRUE/FALSE)
1.is.integer

it tests for objects of type "integer".


> x <- 100L
>x
[1] 100
> is.integer(x)
[1] TRUE
2.is.double

it tests for objects of type "double".


> x <- 100
>x
[1] 100
> is.double(x)
[1] TRUE

3. is.character

it tests for objects of type "character".


> s <- "welcome"
>s
[1] "welcome"
> is.character(s)
[1] TRUE

eg:
is.complex()
is.vector()
is.list()
is.array()
is.matrix()
is.factor()
is.data.frame()

class function in R
The class() function in R is used to return type of an R object.

Syntax
class(x)
x: This represents the R object
eg:
print(class(100))
print(class(100L))
print(class('A'))
print(class("WELCOME"))
print(class(2+3i))

output:
[1] "numeric"
[1] "integer"
[1] "character"
[1] "character"
[1] "complex"
R - Operators
Operators are the symbols used to perform various kinds of
operations between the operands. Operators simulate the various
mathematical, logical, and decision operations performed on a set of
Numericals, Integers,and Complex Numbers as input operands.

Types of Operators
We have the following types of operators in R programming
1. Arithmetic Operators
2. Relational Operators
3. Logical Operators
4. Assignment Operators

Arithmetic Operators
Arithmetic operations simulate various mathematical operations,
like addition, subtraction, multiplication, division, and modulus using the
specified operator between operands. The operations are performed
element-wise at the corresponding positions of the vectors.
+ Addition
eg:
> 2+3
[1] 5

- Subtraction
eg:
> 8-5
[1] 3

* Multiplication
eg:
> 2*5
[1] 10
/ Division
eg:
> 10/3
[1] 3.333333

^ power
eg:
> 2^3
[1] 8

%% Modulus :
eg:
> 10%%3
[1] 1

Script
write a r script to enter any two integers and perform all arithmetic
operations.
ari.R
print("Enter any two integer values")
a <- as.integer(readline())
b <- as.integer(readline())
cat("Addition :",a+b)
cat("\nSubtraction : ",a-b)
cat("\nMultiplication :",a*b)
cat("\nDivision :",a/b)
cat("\npower :",a^b)
cat("\nModulus :",a%%b)

output:
Addition : 13
Subtraction :7
Multiplication : 30
Division : 3.333333
power : 1000
Modulus :1

Relational Operators
These are used to test the relation between the operands. These
operators Returns a boolean value.

< Less than


> Greater than
<= Less than equal to
>= Greater than equal to
== Equal to
!= Not equal to

eg:
> 10<20
[1] TRUE

> 5==10
[1] FALSE

> 5!=10
[1] TRUE

> 10<=20
[1] TRUE

Logical Operators
These are used to combine the result of two or more expressions or values .these
operators returns boolean value True or False.

Logical AND operator (&&):


Returns True if both the operands are True.

Logical OR operator (||):


Returns True if any one of the operand is True.
eg:
> 10 && 20
[1] TRUE
> 0 && 10
[1] FALSE
> TRUE && TRUE
[1] TRUE
> TRUE && FALSE
[1] FALSE
> TRUE || FALSE
[1] TRUE

Logical Not(!)
It takes a value of the operand and gives the opposite logical value.
eg1:
> n=10
> !n
[1] FALSE
> n=0
> !n
[1] TRUE

Assignment Operators
Assignment operators are used to assigning values to various data objects in R.
There are two kinds of assignment perators
1.Left Assignment (<- or <<- or =):
eg:
n <- 10
(or)
n=10
(or)
n <<- 10

2.Right Assignment (-> or ->>):


10 -> n
10->> n

You might also like