0% found this document useful (0 votes)
85 views72 pages

Computing-II - Lecture Notes-I

This document provides an introduction to using R and RStudio for statistical analysis. It discusses downloading and installing R and RStudio, the basic structure and functionality of R, and an overview of key concepts like R packages and the R interface. Instructions are given for setting up R and RStudio, running basic commands, and navigating the RStudio interface. Fundamental R concepts such as data types, assignment, and working with variables are also outlined. The document serves as an introductory guide for new users of R.

Uploaded by

hirko
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views72 pages

Computing-II - Lecture Notes-I

This document provides an introduction to using R and RStudio for statistical analysis. It discusses downloading and installing R and RStudio, the basic structure and functionality of R, and an overview of key concepts like R packages and the R interface. Instructions are given for setting up R and RStudio, running basic commands, and navigating the RStudio interface. Fundamental R concepts such as data types, assignment, and working with variables are also outlined. The document serves as an introductory guide for new users of R.

Uploaded by

hirko
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Department of Statistics

Ambo University

Statistical Computing II (Stat 3022)

Using

R & SAS
By: Senahara Korsa (PhD)

3/29/2023
Introduction to R
 To support your learning and understanding of R, the first thing
you will need to do is download and install both R and RStudio on
your computer.
 Take a look at the Setup link
https://alexd106.github.io/intro2R/setup.html for further
instructions on how to set up your computer and download the
required datasets.
 To get up and running the first thing you need to do is install R.
 R is freely available for Windows from the Comprehensive R
Archive Network (CRAN) website.
 Click on the link below for step-by-step instructions on how to
download and install R and Rstudio
https://alexd106.github.io/intro2R/howto.html#install
Senahara Korsa (PhD) 3/29/2023 2
What is R?
• R is a free language and environment for statistical computing and
graphics.
• R was initially written in the 1990s by Ross Ihaka and Robert
Gentleman.
• The name is partly based on the (first) names of the first two R
authors (Robert Gentleman and Ross Ihaka).
• R is widely used programming language for statistical analysis,
predictive modeling and data science
Some tasks of R are:
• Exploring and Manipulating Data
• Building and validating predictive models
• Applying machine learning and text mining algorithms
• Creating visual appealing graphs
• Connecting with Databases
Senahara Korsa (PhD) 3/29/2023 3
Download R and Rstudio

Download R from:
http://cran.r-project.org/bin/

Download Rstudio from:


http://www.rstudio.com/ide/download/desktop

Senahara Korsa (PhD) 3/29/2023 4


To Download R:
1. Visit official site of R - http://www.r-project.org/.
2. Click on "CRAN" located at the left hand side of the
page.
3. Choose your country and click on the link available for
your location.
4. Click Download R for Windows (For Windows)
5. Click base
Follow the steps shown in the image below.

Senahara Korsa (PhD) 3/29/2023 5


Senahara Korsa (PhD) 3/29/2023 6
Structure of R
R Packages
• A package is a collection of previously programmed functions,
often including functions for specific tasks.
• There are two types of packages: those that come with the base
installation of R and packages that you must manually download
and install.
• To get the list of all the packages installed, run the following R
script:
>library()

Senahara Korsa (PhD) 3/29/2023 7


Installing and Loading R Packages
• There are two ways to add new R packages.
• One is installing directly from the CRAN (Comprehensive R
Archive Network) directory and
• Another is downloading the package to your local system and
installing it manually.
• Install directly from CRAN
• The following command gets the packages directly from the
CRAN webpage and installs the package in the R environment.
#install a new package from webpage
>install.packages(“Package Name")

Senahara Korsa (PhD) 3/29/2023 8


• Loading Packages
• Whenever you want to use an R package, you must not only
have installed locally you must also load the package during the
session you are using it.
• To load an installed package, select the “Load package” option
from the packages menu.
• To load packages to your workspace: for instance,
#load the package “foreign” to your workspace
>library(foreign)
#load the package “Hmisc” to your workspace
>library(Hmisc)

Senahara Korsa (PhD) 3/29/2023 9


R Window
• R Editor window
• It is where you write code or program.
• R Console
• It is where you can see result or output of your code.
• You can write code here as well but it's difficult to edit
or make changes in the R Console.

Senahara Korsa (PhD) 3/29/2023 10


Senahara Korsa (PhD) 3/29/2023 11
Running R
• When you start R the first thing you will see is the prompt (>) in GUI
(graphical user interface), which is R's way of saying ―”Go Ahead ...
Do something”.
• The instructions you give to R are called commands.
• Commands are separated either by a semicolon (;), or by a newline.
• Comments can be put almost anywhere, starting with a hashmark (#);
everything after # is a comment.
• If you see a “+” in place of the prompt that means your last command
was not completed.
• To get more information on any specific named function, e.g sqrt(),
the command is
>help(sqrt) or
> ?sqrt

Senahara Korsa (PhD) 3/29/2023 12


R - Ugly Interface?
• Many novice R users hate R interface.
• It looks boring as compared to SAS/SPSS.
• It is not user friendly at all.
• It makes writing program complex.
• How to make it less boring (or interesting)? RStudio
comes to rescue!

Senahara Korsa (PhD) 3/29/2023 13


R Studio Interface

Senahara Korsa (PhD) 3/29/2023 14


RStudio…
• After installing R and RStudio, launch RStudio from your
computer “application folders”.
• RStudio is a four panel work-space for
1. creating file containing R script
2. typing R commands
3. viewing command histories
4. viewing plots and more.

Senahara Korsa (PhD) 3/29/2023 15


Rstudio Screen
1. Top-left panel: Code editor allowing you to create and open a file
containing R script.
The R script is where you keep a record of your work.
R script can be created as: File –> New –> R Script.
2. Bottom-left panel: R console for typing R commands
3. Top-right panel:
• Workspace tab: shows the list of R objects you created.
• History tab: shows the history of all previous commands.
4. Bottom-right panel:
• Files tab: show files in your working directory
• Plots tab: show the history of plots you created. From this tab,
you can export a plot to a PDF or an image files
• Packages tab: show external R packages available on your
system. If checked, the package is loaded in R.

Senahara Korsa (PhD) 3/29/2023 16


Senahara Korsa (PhD) 3/29/2023 17
R and Rstudio

• R and RStudio are not two different versions of the same thing.
• In fact, they work together.
• R is a programming language for statistical calculation.
• And RStudio is an Integrated Development Environment (IDE)
that helps you develop programs in R.
• You can use R without using RStudio, but you cannot use
RStudio without using R, so R comes first.
• Rstudio is not R or a type of R, rather it is a program that runs R
and provides extra tools that are helpful when writing R code.
• Everything that is conducted in Rstudio can also be done
directly on R.

Senahara Korsa (PhD) 3/29/2023 18


Some Useful RStudio Shortcuts
1. Press CTRL + Enter to submit/run code
2. Press CTRL + SHIFT + C to comment/ uncomment code
3. Press CTRL + SHIFT + N to create a new R script

Senahara Korsa (PhD) 3/29/2023 19


Fundamentals of R

1. R is a case-sensitive package.
• For instance: AGE, Age, and age are different objects!
2. R as calculator
• You can use R as a powerful calculator for a wide range of
numerical computations.
Example:
> log2(32)
[1] 5
>seq(0, 5, length=6)
[1] 0 1 2 3 4 5
>plot(sin(seq(0, 2*pi, length=100)))
The [1] tells you the resulting value is the first result.

Senahara Korsa (PhD) 3/29/2023 20


3. The # character at the beginning of a line signifies a comment.

4. The operator "<–" (without quotes) is equivalent to "=" sign .


You can use either of the operators.

5. The getwd() function shows the working directory

Senahara Korsa (PhD) 3/29/2023 21


6. R uses forward slashes instead of backward slashes in
filenames (as shown in the image above).
7. The setwd() function tells R where you would like your files to
save (changes the working directory).
setwd ("C:/Users/Deepanshu/Downloads")
8. The c function is widely used to combine values to form a
vector.

9. In RStudio, press CTRL + SHIFT + A to format your code.


10. R uses NA to represent Not Available, or missing values.

Senahara Korsa (PhD) 3/29/2023 22


11. To calculate sum excluding NA, use na.rm = TRUE (By default, it is
FALSE).

12. The form 1:10 generates the integers from 1 to 10.

13. R is case-sensitive, so you have to use the exact case


that the program requires.
14. To get help for a certain function such as sum, use the form:
>help (sum).

Senahara Korsa (PhD) 3/29/2023 23


15. Object names in R can be any length consisting of letters,
numbers, underscores ‘‘_’’ or the period ‘‘.’’
16. Object names in R should begin with a letter.
17. Unlike SAS and SPSS, R has several different data structures
including vectors, factors, data frames, matrices, arrays, and lists.

Senahara Korsa (PhD) 3/29/2023 24


Variable Declaration and Assignment
• To assign a value to a variable, use the assignment
command “<-“.
• A variable name can be any name made up of letters,
numbers, and or _ provided it starts with a letter, or then a
letter.
• Note that names are case sensitive.
• Do not start a variable name with a number.
Example:
>x <-2 #Assigning 2 to the variable x
>y <- “Hello!!” #Assigning a string to the variable y

Senahara Korsa (PhD) 3/29/2023 25


Working with Variable

• Finding Variables
• To know all the variables currently available in the workspace
we use the ls() function.
• Deleting Variables
• Variables can be deleted by using the rm() function.
• To remove object x, for instance, use:
>rm(x) #to delete specific object x
>rm(list = ls())# To remove all currently defined objects

Senahara Korsa (PhD) 3/29/2023 26


Save works on R
 To save works on R use the following

> save(file="file-name.RData") or

> save.image("filen-name.RData") or

 From the file menu:

File Save workspace browse to the folder where


you want to save supply the file name.

Senahara Korsa (PhD) 3/29/2023 27


Save R output
 To save your output from a session in R can be saved
using the sink command:

> sink(“Rintro”)#to save the output in R

> sink(“output.txt")#to save the output in txt form

 To turn off the sink command:

> sink()

Senahara Korsa (PhD) 3/29/2023 28


Getting online help in R
 Help is also available in HTML format by running
> help.start()
which will launch a web browser that allows the help
pages to be browsed with hyperlinks.
 The help.search command (alternatively ?? )

allows searching for help in various ways.


Example:
> ?? Solve #Or
> help.search(“solve”)
 Try ?help.search for details and more examples.

Senahara Korsa (PhD) 3/29/2023 29


Getting online help in R…

 The examples on a help topic can normally be run by

> example("topic")

 For example: to know what lm does with


demonstration type the following command:

>example(lm)

 For further information about online help: use

> ? help

Senahara Korsa (PhD) 3/29/2023 30


Assigning values to variables
 Assignment can be made using any of the following operations:
 Using “<-”
 Example: > x <-5
 Using “=”
 Example: >x = 5
 Using the function “assign()”
 Example: > assign(“x”, 5+10)
 Using “->”
 Example: > 5->X

Senahara Korsa (PhD) 3/29/2023 31


Objects and Simple Manipulation
 The entities that R creates and manipulates are known as
objects.

 These may be variables, arrays of numbers, character strings,


functions, or more general structures built from such
components.
 To list the objects you have created in a session use
either of the following commands:

> objects()

> ls()

Senahara Korsa (PhD) 3/29/2023 32


Data types in R
 R has several different data types
 Vectors
 Lists
 Data frames
 Matrices
 Arrays
 Factors

Senahara Korsa (PhD) 3/29/2023 33


Vectors
 Vectors are the simplest type of object in R.

 They can easily be created with c, the combined function.

 There are 3 main types of vectors:

 Numeric vectors

 Character vectors

 Logical vectors

Senahara Korsa (PhD) 3/29/2023 34


Numeric Vectors
 It is a single entity consisting of an ordered collection of
numbers.
 Example: to set up a numeric vector X consisting of 5
numbers, 10, 6, 3, 6, 22, we use any one of the following
commands:

> x<-c(10, 6, 3, 6, 22) #OR


> x= c(10, 6, 3, 6, 22) #OR

> assign(“x”, c(10, 6, 3, 6, 22))#OR


> c(10, 6, 3, 6, 22)->x

Senahara Korsa (PhD) 3/29/2023 35


Numeric Vectors…
 The further assignment

> y<-c(x,0,x)

 This would create a vector y with 11 entries consisting of


two copies of x with a zero in the middle place.

> y<-c(x,0,x)

> y

[1] 10 6 3 6 22 0 10 6 3 6 22

Note: The [1] in front of the result is the index of the first
row in the vector x.

Senahara Korsa (PhD) 3/29/2023 36


Numeric Vectors…
Functions that return a single value
> length(x) # the number of elements in x

> sum(x) # the sum of the values of x


> mean(x) # the mean of the values of x
> var(x) # the variance of the values of x
> sd(x) # the standard deviation of the values of x
> min(x) # the minimum value from the values of x
> max(x) # the maximum value from the values of x
> prod(x) # the product of the values of x
> range(x) # the range of the values of x

Senahara Korsa (PhD) 3/29/2023 37


Numeric Vectors…
Functions that return vectors with the same length
 To print the rank of the values of x:
> order(x)
> sort.list(x)
 To print the values of x in increasing order
> sort(x)
> x[order(x)]
> x[sort.list(x)]
 To print the reciprocals of the values of x
> 1/x

Senahara Korsa (PhD) 3/29/2023 38


Numeric Vectors…
 To print the sin, cos, tan, asin, acos, atan, log, exp, … of
the values of x:

> sin(x)

> cos(x)

> exp(x)

 The parallel maximum and minimum functions pmax and


pmin return a vector (of equal to their largest argument)
that contains in each element the largest (smallest) element in
that position in any of the input vectors.
Senahara Korsa (PhD) 3/29/2023 39
Let x be:
> x=c(1:10)
> x
[1] 1 2 3 4 5 6 7 8 9 10
> pmax(x,6)
[1] 6 6 6 6 6 6 7 8 9 10
 Returns the values of x but values that are less than 6
will be replaced by 6.
> pmin(x,6)
[1] 1 2 3 4 5 6 6 6 6 6

 Returns the values of x but values that are greater than


6 will be replaced by 6.

Senahara Korsa (PhD) 3/29/2023 40


Character vectors
Character strings are another common data type, used to
represent text.

 To set up a character/string vector z consisting of 3 place


names use:

 Character strings are entered using either matching double ("


") or single (' ') quotes, but are printed using double quotes
(or sometimes without quotes).
Senahara Korsa (PhD) 3/29/2023 41
Escape Character
 Most characters can be used in a string, with a couple of
exceptions, one being the backslash character, "\".
 This character is called the escape character and is used to
insert characters that would otherwise be difficult to add.
 The table below shows some of the other characters that can
be "escaped" in this way.

Example

Senahara Korsa (PhD) 3/29/2023 42


Concatenation

 Character vectors can be concatenated using c( )

 Example:

Senahara Korsa (PhD) 3/29/2023 43


Logical Vectors
 A logical vector is a vector whose elements are TRUE,
FALSE or NA.
 Logical vectors are generated by conditions.
 Example: temp <- x>13

 Sets temp as a vector of the same length as x with values


FALSE corresponding to elements of x where the
condition is not met and TRUE where it is met.

 The logical operators are <, <=, >, >=, = = for exact
equality and != for inequality.
 In addition if c1 and c2 are logical expressions, then c1&c2 is
their intersection(“and”), c1|c2 is their union(“or”), and !c1 is
the negation of c1.
Senahara Korsa (PhD) 3/29/2023 44
Missing Values inVectors
 The function is.na(x) gives a logical vector of the same size as x
with value TRUE if and only if the corresponding element in x is
NA = not available or a missing value. Example:

 Note that there is a second kind of ―missingǁ values which are


produced by numerical computation, the so-called Not a Number,
NaN, values.
Example:

Note: is.na(xx) is TRUE both for NA and NaN

Senahara Korsa (PhD) 3/29/2023 45


Indexing Vectors
 Vectors indices are placed with square brackets:[]

 Vectors can be indexed in any of the following ways:

 Vector of positive integers


 Vector of negative integers
 Vector of named items
 Logical vector

Example:

Senahara Korsa (PhD) 3/29/2023 46


 Printing all elements of const except the 1st and the 2nd.
> const[c(-1,-2)]
sqrt2 golden
1.4142 1.618

 Printing TRUE or FALSE if respective value is greater than 2.

> const>2
pi euler sqrt2 golden
TRUE TRUE FALSE FALSE

 Printing Truth valued vallues.


> const[const>2]
pi euler
3.1416 2.7183

Senahara Korsa (PhD) 3/29/2023 47


Modifying Vectors
 To alter the contents of a vector, similar methods can
be used.
Example: Create a variable x with 5 elements: 10 5 3 6 21
> x=c(10,5,3,6,21)
 Now, to modify the first element of x and assign it a value 7
use
> x[1]<-7
> x
[1] 7 5 3 6 21
 The following command replaces any NA (missing)
values in the vector w with the value 0:
> w[is.na(w)]<-0

Senahara Korsa (PhD) 3/29/2023 48


Generating sequences
 R has a number of ways to generate sequences of
numbers.

 These includes by using colon (:)

 Example:

 To generate numbers from 1 to 10 in increasing order

> 1:10

[1] 1 2 3 4 5 6 7 8 9 10

 To generate numbers from 10 to 1 in decreasing order

> 10:1

[1] 10 9 8 7 6 5 4 3 2 1
Senahara Korsa (PhD) 3/29/2023 49
Generating sequences…
 The colon operator has high priority within an expression.

 Example: 2*1:10 is equivalent to 2*(1:10)

> 2*1:10
[1] 2 4 6 8 10 12 14 16 18 20
> 2*(1:10)
[1] 2 4 6 8 10 12 14 16 18 20
 The other option to generate numbers is using seq() function

Example:

> seq(1:10)

> seq(from=1, to=10)

> seq(to=10, from=1)

are all equivalent to 1:10


Senahara Korsa (PhD) 3/29/2023 50
Generating sequences…
 The parameters by=value and length=value specify a step
size and length for the sequence respectively.
 If neither of these is given, the default by=1 is assumed.

Example:
>seq(1,5, by=2)
[1] 1 3 5
> seq(1,10, length=5)

[1] 1.00 3.25 5.50 7.75 10.00


>seq(from=1, by=2.25, length=5)
[1] 1.00 3.25 5.50 7.75 10.00

Senahara Korsa (PhD) 3/29/2023 51


Vector Replication
 The function rep() can be used for replicating an object in
various complicated ways.
 The ff command will print 5 copies of x end to end
> rep(x, times=5) #or
> rep(x, 5)

 While the command :


> rep(x, each=5)

will print each element of x five times before moving onto the next.
 Further more, the command
> rep(c(1,4), c(2,3))

will print 1 two times and then 4 three times (1 1 4 4 4 ).

Senahara Korsa (PhD) 3/29/2023 52


Matrices
 A matrix can be regarded as a generalization of a vector.

 As with vectors, all the elements of a matrix must be of the


same data type.

 A matrix can be generated in two ways.

 Method 1: Using the function dim:

 Example

Senahara Korsa (PhD) 3/29/2023 53


Matrices…
Method 2: Using the function matrix:
> x <- matrix(c(1:8),2,4,byrow=F)

> x

[,1] [,2] [,3] [,4]

[1,] 1 2 3 4

[2,] 5 6 7 8

An equivalent expression:
> x<-matrix(c(1:8),nrow=2,ncol=4)

Senahara Korsa (PhD) 3/29/2023 54


Matrices…
 By default the matrix is filled by column.

 To fill the matrix by row specify byrow = T as


argument in the matrix function.

 Use the function cbind to create a matrix by binding


two or more vectors as column vectors.

 The function rbind is used to create a matrix


by binding two or more vectors as row vectors.

 Example:
> cbind(c(1,2,3),c(4,5,6))

> rbind(c(1,2,3),c(4,5,6))

Senahara Korsa (PhD) 3/29/2023 55


Matrix operations
 Matrix operations (multiplication, transpose, etc.) can easily
be performed in R using a few simple functions like:

Name Operation
dim() Dimension of the matrix (number of rows and columns)
as.matrix() Used to convert an argument into a matrix object
%*% Matrix multiplication
t() Matrix transpose
det() Determinant of a square matrix
solve() Matrix inverse; also solves a system of linear equations
eigen() Computes eigenvalues and eigenvectors

Senahara Korsa (PhD) 3/29/2023 56


Matrix Functions

Senahara Korsa (PhD) 3/29/2023 57


Arrays
 It can be considered as a multiply subscripted collection of

data entries, for example numeric.


 Arrays are generalizations of vectors and matrices.

 That means, vectors in the mathematical sense are one-

dimensional arrays where as matrices are two-dimensional


arrays; higher dimensions are also possible.
 There are two methods of creating arrays in R

 Method 1: Using vectors

 A vector can be used by R as an array only if it has a

dimension vector as its dimattribute.


 A dimension vector is a vector of non-negative integers.

 If its length is k then the array is k dimensional.


Senahara Korsa (PhD) 3/29/2023 58
Arrays…
 An array can be created by giving a vector structure, a dim,
which has the form > z=data_vector
> dimention_vector
 Example: The following is a 3 X 5 X 100 (3-dimentional) array
with dimension vector c(3,5,100) and a vector of 1500 elements.
> z=c(1:1500)

> dim(z) <- c(3,5,100)


 If the dimension vector for an array, say A, is c(3,4,2) then
there are 3X4X2 = 24 entries in A and the data vector holds
them in the order A[1,1,1], A[2,1,1], ..., A[2,4,2], A[3,4,2].
> A=c(5:28)
> dim(A)=c(3,4,2)

Senahara Korsa (PhD) 3/29/2023 59


Arrays…
Method 2: Using the function array()

 As well as giving a vector structure a dim attribute, arrays can be


constructed from vectors by the array function, which has the form

> Z<- array(data_vector, dim_vector)

 Example: if the vector h contains 24 or fewer, numbers then


the command

> Z<- array(h, dim=c(3,4,2))

would use h to set up 3 by 4 by 2 array in Z. If the size of h is


exactly 24 the result is the same as

> Z<- h ; dim(Z) <- c(3,4,2)

Senahara Korsa (PhD) 3/29/2023 60


Lists
 Lists: are collections of arbitrary objects.

 That is, the elements of a list can be objects of any type and
structure.

 Consequently, a list can contain another list and therefore it can


be used to construct arbitrary data structures.

 A list could consist of a numeric vector, a logical value, a matrix,


a complex vector, a character array, a function, and so on.

 Lists are created with the list()command:

L<-list(object-1,object-2,…,object-m)

Senahara Korsa (PhD) 3/29/2023 61


Lists example

Senahara Korsa (PhD) 3/29/2023 62


Lists…
 The elements of the list are accessed with the [[ ]] operator.

 Examples: Consider the previous L elements


>L <- list( c(1,5,3), matrix(1:6, nrow=3), c("Hello", "world"))

> L[[1]] # To obtain the 1st element of L

[1] 1 5 3

>L[[2]][2,1] # Element [2,1] of the second element of L

[1] 2
>L[[c(3,2)]] # Recursively: element 3 of L, hereof the 2nd element [1] "world”

> #OR

> L[[3]][2]

[1] "world"

Senahara Korsa (PhD) 3/29/2023 63


Data Frames
 Data frames: regarded as an extension to matrices.

 Data frames can have columns of different data types and are
the most convenient data structure for data analysis in R.

 Data frames are lists with the constraint that all elements are
vectors of the same length.

 The command data.frame() creates a data frame:

>dat<-data.frame(object-1,object-2,…,object-m)

Senahara Korsa (PhD) 3/29/2023 64


Example of Data Frames
 The data frame below contains the results of an experiment to
determine the effect of removing the tip of petunia plants grown
at 3 levels of nitrogen on various measures of growth.
 The data frame has 8 variables (columns) and each row
represents an individual plant.
 The variables treat and nitrogen are factors
(categorical variables).
 The treat variable has 2 levels (tip and notip) and the nitrogen
variable has 3 levels (low, medium and high).
 Variables height, weight, leafarea and shootarea are numeric
and the variable flowers is an integer representing the number of
flowers.

Senahara Korsa (PhD) 3/29/2023 65


Example…
treat nitrogen block height weight leafarea shootarea flowers

tip medium 1 7.5 7.62 11.7 31.9 1

tip medium 1 10.7 12.14 14.1 46 10

tip medium 1 11.2 12.76 7.1 66.7 10

tip medium 1 10.4 8.78 11.9 20.3 1

tip medium 1 10.4 13.58 14.5 26.9 4

tip medium 1 9.8 10.08 12.2 72.7 9


notip low 2 3.7 8.1 10.5 60.5 6
notip low 2 3.2 7.45 14.1 38.1 4
notip low 2 3.9 9.19 12.4 52.6 9
notip low 2 3.3 8.92 11.6 55.2 6
notip low 2 5.5 8.44 13.5 77.6 9
notip low 2 4.4 10.6 16.2 63.3 6

Senahara Korsa (PhD) 3/29/2023 66


Data Frame Construction
 We can construct a data frame from existing data objects
such as vectors using the data.frame() function.
 As an example, let’s create three
vectors p.height, p.weight and p.names and include all of
these vectors in a data frame object called dataf.
>p.height <- c(180, 155, 160, 167, 181)
>p.weight <- c(65, 50, 52, 58, 70)
>p.names <- c("Joanna", "Charlotte", "Helen", "Karen", "Amy")
>dataf <- data.frame(height = p.height, weight = p.weight, names =
p.names)
>dataf

Senahara Korsa (PhD) 3/29/2023 67


Data Frames…
 A data frame is a powerful two-dimensional object made
up of rows and columns which looks superficially very
similar to a matrix.
 However, whilst matrices are restricted to containing data
all of the same type, data frames can contain a mixture of
different types of data.
 Typically, in a data frame each row corresponds to an
individual observation and each column corresponds to a
different measured or recorded variable.

Senahara Korsa (PhD) 3/29/2023 68


Data Frames…
Example:
> name= c("Eden","Solomon","Zelalem","Kidist")
> age=c(18,22,25,27)
> sex=c("F","M","M","F")
> stud=data.frame(name,age,sex)
> stud
name age sex
1 Eden 18 F
2 Solomon 22 M
3 Zelalem 25 M
4 kidist 27 F

 To display the column names:

> names(stud) #OR colnames(stud)


[1] "name" "age" "sex"

 To display the row names:


> rownames(stud)
[1] "1" "2" "3" "4"

Senahara Korsa (PhD) 3/29/2023 69


Data Frames…
 You can use the “ names“ function to change the column names:
> names(stud)<-c("name","age","sex")
> stud
name age sex
1 Eden 18 F
Variable name
2 Solomon 22 M
3 Zelalem 25 M
4 kidist 27 F

 Similarly, use the “row.names” function to change the row names:


> row.names(stud)<("Wrt","Ato1","Ato2","Wro")
> stud
Wrt Eden 18 F
Ato1 Solomon 22 M
Ato2 Zelalem 25 M
Wro Kidist 27 F

Senahara Korsa (PhD) 3/29/2023 70


Factors
• R has a special data structure to store categorical variables,
which is factor.
• It tells R that a variable is nominal or ordinal by making it a
factor.
• Simplest form of the factor function:

Senahara Korsa (PhD) 3/29/2023 71


• Ideal form of the factor function:

The factor function has 3 parameters:


1. Vector Name
2. Values (Optional)
3. Value labels (Optional)

Senahara Korsa (PhD) 3/29/2023 72

You might also like