0% found this document useful (0 votes)
34 views52 pages

MIS 4.hafta (Introduction To R)

This document discusses R packages, installing and loading packages, repositories, data sets, logical operators, numeric vectors, and changing data types in R. Some key points include: - Packages are collections of R functions, data, and code that increase the functionality of R. Common packages are installed from repositories like CRAN. - To install a package, use install.packages("package_name") and to load it use library("package_name"). - R comes with built-in data sets that can be accessed with data("data_set"). - Logical operators like == and != can be used to compare values. - Numeric vectors can be created and manipulated using functions like c(), seq(),
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views52 pages

MIS 4.hafta (Introduction To R)

This document discusses R packages, installing and loading packages, repositories, data sets, logical operators, numeric vectors, and changing data types in R. Some key points include: - Packages are collections of R functions, data, and code that increase the functionality of R. Common packages are installed from repositories like CRAN. - To install a package, use install.packages("package_name") and to load it use library("package_name"). - R comes with built-in data sets that can be accessed with data("data_set"). - Logical operators like == and != can be used to compare values. - Numeric vectors can be created and manipulated using functions like c(), seq(),
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Management Information Systems

Lesson 3
Introduction to R (2)
Packages
• Packages are collections of R
functions, data, and compiled
code in a well-defined format.
• The directory where packages are
stored is called the library.
• R comes with a standard set of
packages. Others are available for
download and installation. Once
installed, they have to be loaded
into the session to be used.
• CRAN (Comprehensive R Arehive
Network)

2
Packages
• Packages increase the power of
R by improving existing base R
functionalities, or by adding
new ones.
• For example, if you are usually
working with data frames,
probably you will have heard
about dplyr or data.table, two
of the most popular R packages.

3
Packages

4
How To Install and Load Packages
• Almost all packages are
available through the CRAN site.
available.packages()
• Receive a list of packages that
can be loaded.
• When you use this command, a
list of the packages that are
currently available are shown.

5
What Are Repositories?
• A repository is a place where packages are located so you can install them
from it.
• Three of the most popular repositories for R packages are:
• CRAN: the official repository, it is a network of ftp and web servers mantained by the R
community around the world. It is coordinated by the R foundation, and for a package
to be published here it needs to pass several tests that ensure the package is following
CRAN policies.
• Bioconductor: this is a topic specific repository, intended for open source software for
bioinformatics. As CRAN, it has its own submission and review processes, and its
community is very active having several conferences and meetings per year.
• Github : although this is not R specific, github is probably the most popular repository
for open source projects. Its popularity comes from the unlimited space for open
source, the integration with git, a version control software, and its ease to share and
collaborate with others.
6
How To Install and Load Packages
• The most common way is to use the
CRAN repository, then you just need
the name of the package and use the
command
• install.packages("package")
• library() is the command used to load
a package, and it refers to the place
where the package is contained,
usually a folder on your computer,
while a package is the collection of
functions bundled conveniently.
• library()
• Exp:
• "ggplot2"
• install. packages ("ggplot2")
• library(ggplot2)

7
How To Install and Load Packages
• Tools → Install Packages

8
How To Install and Load Packages
• To use a package that has been
installed, the library (paket. name) is
used.
• " no sign required.
• The "Packages" tab is also available.
• The box next to the desired package
is marked here.

9
Getting Information About Packages
• packageDescription ("dplyr")

• help ( package = "dplyr")

10
Update R with"installr'' Package
• > install.packages("installr")
• > library(installr)
• > updateR()

11
Data Sets
• When you install R, the sample loads
several sets of data.
• An important part of these is found
in R's datasets package.
• It is sufficient to write data () to see
the data sets in the active packages.
• In this case, the following list will
appear in the upper left corner.

12
Data Sets
• In order to see all the data sets
contained in all packages installed

• > data(package =
.packages(all.available = TRUE))

13
Data Sets
• Let's assume that a single data set is
desired.
• For example, the datasets package
"cars" data set should be used.
• For doing this write,
• data (cars, package= "datasets")
then write
• cars

14
Data sets
• The head () function is used to see
only the first records of the data set.

• head (cars , 10)


• First 10 records (default 6)

15
Logical operators in R
• > 7 == (4 + 3)
• > "C++" != "R"
• > FALSE ! = TRUE
• > "Ankara " != "ankara"
• > TRUE == 1

16
Numeric Vectors
• There are several ways to assign values to a variable:

> a <- 1.7 # assign a value to a vector with only one element (~ scalar)
> 1.7 -> a # assign a value to a vector with only one element (~ scalar)
> a = 1.7 # assign a value to a vector with only one element (~ scalar)
> assign("a", 1.7) # assign a value to a vector with only one element (~ scalar)

17
Numeric Vectors
• To show the values:

a # show the value in the screen (not valid in scripts)


[1] 1.7
print(a) # show the value in the screen (valid in scripts)
[1] 1.7

18
Numeric Vectors
• To generate a vector with several numeric values:

a <- c(10, 11, 15, 19) # assign four values to a vector using the concatenate command c()

a # show the value in the screen


[1] 10 11 15 19

19
Numeric Vectors
• The operations are always done over all the elements of the numeric array:

a*a # evaluate the square value of every element in the vector


[1] 100 121 225 361

1/a # evaluate the inverse value of every element in the vector


[1] 0.10000000 0.09090909 0.06666667 0.05263158

b <- a-1 # subtract 1 from every element and assign the result to b

b
[1] 9 10 14 18
20
Numeric Vectors
• To generate a sequence:

2:10 # generate a sequence from n1=2 to n2=10 using n1:n2


[1] 2 3 4 5 6 7 8 9 10

5:1 # generate an inverse sequence if n2 < n1


[1] 5 4 3 2 1

seq(from=n1, to=n2, by=n3) # generate sequence from n1 to n2 using n3 step


# (parameters names can be avoided if order is kept)
.
21
Numeric Vectors
• To generate a sequence:

seq(from=1, to=10, by=3)


[1] 1 4 7 10

seq(1, 10, 3)
[1] 1 4 7 10

seq(length=10, from=1, by=3) # generate a fixed length sequence


[1] 1 4 7 10 13 16 19 22 25 28

> help(seq) # for help about this command


22
Numeric Vectors
• To generate repetitions:

a <- 1:3; b <- rep(a, times=3); c <- rep(a, each=3) # command rep()

In the previous example we have run three commands in the same line. They have been separated by a ‘;’.

The content of the three variables is now:


a
[1] 1 2 3
b
[1] 1 2 3 1 2 3 1 2 3
c
[1] 1 1 1 2 2 2 3 3 3
23
Numeric Vectors
• The recycling rule: vectors of different sizes can be combined, as far as the
length of the longer vector is a multiple of the shorter vector’s length
(otherwise a warning is issued, although the operation is carried out):

a+c # proper dimensions


[1] 2 3 4 3 4 5 4 5 6 # (operation equivalent to b+c)

d <- c(10,100)
b+d # incorrect dimensions
[1] 11 102 13 101 12 103 11 102 13
Warning message:
In b + d : longer object length is not a multiple of shorter object length
24
Numeric Vectors
If we need to know which are the objects that are currently defined, we can list them:
ls()
[1] "a" "b" "c" "d"
Undesired objects can be deleted using rm() function:
rm(a,c) # remove objects 'a' and 'b‘
ls() # list current objects
[1] "b" "d"
In order to remove everything in the working environment:
rm(list=ls()) # Use this with caution
ls() # (you'll receive no warning!)
character(0)
25
Changing Data Type
X<- 7.97
class(x)
[1] "numeric"
x <- as.integer(x)
X
[1] 7

26
Categorical Variables (Factors)
• Categorical variables or factors can be
used to tag other data.
> x<-factor(c("male" ,"male", "female" ,"male", "female")) • For example, to classify participants
>x according to their gender in a survey
[1] male male female male female study variables such as "1" and "2" or
Levels: female male "a" and "b“ we can use categorical
> table(x)
x variables as “man” or "woman".
female male • Using categorical variables especially in
2 3
data analysis work is very important.
• A data set can be classified, summarized
by categorical variables
• Factors is created with factor () function.
27
Seq()

seq(from = 8, to = 12, by= 2)


[1] 8 10 12
seq(from = 4, to= 7, length = 6)
[1] 4.0 4.6 5.2 5.8 6.4 7.0

28
Rep()
> rep (14, 5)
[1] 14 14 14 14 14
> rep(c(3,5,7), 3)
[1] 3 5 7 3 5 7 3 5 7
> rep(c(3,5,7), each = 3)
[1] 3 3 3 5 5 5 7 7 7
29
Basic Calculations of Vectors
x<- 2:5; y <- 10:13
x
[1] 2 3 4 5
y
[1] 10 11 12 13
x+y
[1] 12 14 16 18

30
Data Frames
• Data frame is a two dimensional data structure in R.

• It is a special case of a list which has each component of equal length.

• Each component form the column and contents of the component


form the rows.

31
Data Frames
Following are the characteristics of a data frame.

1. The column names should be non-empty.


2. The row names should be unique.
3. The data stored in a data frame can be of numeric, factor or
character type.
4. Each column should contain same number of data items.

32
Create Data Frame
x <- data.frame(weight = c(75, 79, 65, 92), height= c(176, 192, 165,
189))

x
weight height
1 75 176
2 79 192
3 65 165
4 92 189

33
Create Data Frame
nrow(x)
[1] 4
ncol(x)
[1] 2

nrow and ncol return the number of rows or columns present in x

34
Create Data Frame
child<- c("Eren", "Ege", "Efe", "Ali")
age<- c(8, 7, 9, 13)
weight<- c(35, 30, 29, 45)
height<- c(135, 120, 115, 150)
df<-data.frame(child,age,weight,height,stringsAsFactors = FALSE)

R tip: use stringsAsFactors = FALSE.


R often uses a concept of factors to re-encode strings.
To avoid problems delay re-encoding of strings by using
stringsAsFactors = FALSE when creating data.frames.

35
Str()
str(df)
'data.frame':4 obs. of 4 variables:
$ child : chr "Eren" "Ege" "Efe" "Ali"
$ age : num 8 7 9 13
$ weight: num 35 30 29 45
$ height: num 135 120 115 150
Compactly display the internal structure of an R object, a
diagnostic function and an alternative to summary.

36
Select Data From Data Frame
Vector
df[["child"]]
[1] "Eren" "Ege" "Efe" "Ali"

Data frame
df["child"]
child
1 Eren
2 Ege
3 Efe
4 Ali
37
Select Data From Data Frame
Data Frame
> df[c("child","age","height")]
child age height
1 Eren 8 135
2 Ege 7 120
3 Efe 9 115
4 Ali 13 150

38
Select Data From Data Frame
df[[1]]
[1] "Eren" "Ege" "Efe" "Ali"
df[[3]]
[1] 35 30 29 45

39
Select Data From Data Frame
df[1, ]
child age weight height
1 Eren 8 35 135

df[ ,1]
[1] "Eren" "Ege" "Efe" "Ali"

df[2,3]
[1] 30
40
Select Data From Data Frame
df$child
[1] "Eren" "Ege" "Efe" "Ali"

df$age
[1] 8 7 9 13

df$weight
[1] 35 30 29 45
41
Subset()
Return subsets of vectors, matrices or data frames which
meet conditions.
subset(df, select = child)
child
1 Eren
2 Ege
3 Efe
4 Ali
42
Subset()
Return subsets of vectors, matrices or data frames which
meet conditions.
subset(df, select = c(height, weight))
height weight
1 135 35
2 120 30
3 115 29
4 150 45
43
Subset()
Return subsets of vectors, matrices or data frames which
meet conditions.
subset(df, subset = (height> 120))
child age weight height
1 Eren 8 35 135
4 Ali 13 45 150

44
Subset()
Return subsets of vectors, matrices or data frames which
meet conditions.
> subset(df, select = c(child, height), subset = (weight< 40))
child height
1 Eren 135
2 Ege 120
3 Efe 115

45
Adding New Data to Data Frames
rbind() function combines vector, matrix or data frame by rows.
The column of the two datasets must be same, otherwise the
combination will be meaningless.
df <- rbind(df, data.frame(child ="Ahmet", age=11, weight=40, height=138))
df
child age weight height
1 Eren 8 35 135
2 Ege 7 30 120
3 Efe 9 29 115
4 Ali 13 45 150
5 Ahmet 11 40 138
46
Adding New Data to Data Frames
rbind() function combines vector, matrix or data frame by rows.
The column of the two datasets must be same, otherwise the
combination will be meaningless.
df <- rbind(df, data.frame(child ="Ahmet", age=11, weight=40, height=138))
df
child age weight height
1 Eren 8 35 135
2 Ege 7 30 120
3 Efe 9 29 115
4 Ali 13 45 150
5 Ahmet 11 40 138
47
Adding New Data to Data Frames
new.record<- data.frame(child = "Ahmet", age=11, weight=40, height=138)
df <- rbind(df, new.record)
df
child age weight height
1 Eren 8 35 135
2 Ege 7 30 120
3 Efe 9 29 115
4 Ali 13 45 150
5 Ahmet 11 40 138
48
Adding New Data to Data Frames
cbind in R – Column bind using cbind function. Column Bind
– Cbind in R appends or combines vector, matrix or data
frame by columns.
pb <- data.frame(place.of.birth = c("izmir", "istanbul", "ankara",
"izmir","denizli"))
df<-cbind(df,pb)
df

49
Adding New Data to Data Frames
child age weight height place.of.birth
1 Eren 8 35 135 izmir
2 Ege 7 30 120 istanbul
3 Efe 9 29 115 ankara
4 Ali 13 45 150 izmir
5 Ahmet 11 40 138 denizli

50
Data Editor Edit()
• While working with data frames,
you would like to see them in an
Excel-like format
• Although at the very simple level
and development stage R has a
data frame editor.
• Opportunities offered by this
feature already very few.
• For example, copy and paste
does not allow the operation.
• On the other hand, it is believed
that the R data editor will evolve
over time and become more rich.
51
by()
• The by( ) function applys a function to each level of a factor or factors.

install.packages("datasets.load")

df<-mtcars
df$gear <- as.factor(df$gear)
by(df$mpg, df$gear, mean)

df$gear: 3
[1] 16.10667
---------------------------------------------------------------
df$gear: 4
[1] 24.53333
---------------------------------------------------------------
df$gear: 5
[1] 21.38
52

You might also like