MIS 3.hafta (Introduction To R)
MIS 3.hafta (Introduction To R)
Lesson 1
R
2020
1
INTRODUCTION TO R
– Software for Statistical Data Analysis
– Based on S
– Programming Environment
– Interpreted Language
– Data Storage, Analysis, Graphing
– Free and Open Source Software
2
INTRODUCTION TO R
3
Data Science
Data science is an
interdisciplinary field
that uses scientific
methods, processes,
algorithms and systems
to extract knowledge
and insights from data
in various forms, both
structured and
unstructured, similar to
data mining…
4
INTRODUCTION TO R
5
INTRODUCTION TO R
6
INTRODUCTION TO R
– Comprehensive R Archive Network:
• http://cran.r-project.org
– Free and Open Source
– Strong User Community
– Highly extensible, flexible
– Implementation of high end statistical methods
– Flexible graphics and intelligent defaults
7
INTRODUCTION TO R
• Highly Functional
– Everything done through functions
– Strict named arguments
– Abbreviations in arguments OK (e.g. T for TRUE)
• Object Oriented
– Everything is an object
– “<-” is an assignment operator
– “X <- 5”: X GETS the value 5
8
FUNCTIONS
• Functions: objects created
by the user and reused to
make specific operations.
• Functions form the core of
R; everything you do in R
uses a function in one way
or another.
• More importantly, the way
functions work in R allows
you to carry out multiple
complex operations in one
step or a few simple steps.
9
Data Structures
• Supports virtually any type of data
• Numbers, characters, logicals (TRUE/ FALSE)
• Arrays of virtually unlimited sizes
• Simplest: Vectors and Matrices
• Lists: Can Contain mixed type variables
• Data Frame: Rectangular Data Set
10
Data Structures
• R is an object-oriented language: an object in R is
anything (constants, data structures, functions, graphs)
that can be assigned to a variable:
• Data Objects: used to store real or complex numerical
values, logical values or characters. These objects are
always vectors: there are no scalars in R.
• Language Objects: functions, expressions
11
Data Structures
Data structure types
• Vectors: one-dimensional arrays used to store collection data
of the same mode
– Numeric Vectors (mode: numeric)
– Complex Vectors (mode: complex)
– Logical Vectors (model: logical)
– Character Vector or text strings (mode: character)
• Matrices: two-dimensional arrays to store collections of data
of the same mode. They are accessed by two integer indices.
12
Data Structures
• Arrays: similar to matrices but they can be multi-dimensional
(more than two dimensions)
• Factors: vectors of categorical variables designed to group the
components of another vector with the same size
• Lists: ordered collection of objects, where the elements can be
of different types
• Data Frames: generalization of matrices where different
columns can store different mode data.
13
Data Structures in R
Linear Rectangular
All
Same VECTORS MATRIX
Type
14
R and R Studio
• R can be used with any operating system that R runs on (Mac,
Linux, or Windows)
• When you download R, you automatically download a console
application that’s suitable for your operating system.
• R Studio is a cross‐platform application, also known as an
Integrated Development Environment (IDE) with some very neat
features to support R.
• R Studio provides a common user interface across the major
operating systems.
• For this reason, we use R Studio to demonstrate some of the
concepts rather than any specific operating‐system version of R.
15
Performing multiple calculations with vectors
• R is a vector‐based language.
• You can think of a vector as a row or column of numbers or text.
• The list of numbers {1,2,3,4,5}, for example, could be a vector.
• Unlike most other programming languages, R allows you to apply
functions to the whole vector in a single operation without the
need for an explicit loop.
• First, assign the values 1:5 to a vector called x:
> x <- 1:5
>x
[1] 1 2 3 4 5
16
Performing multiple calculations with vectors
• Next, add the value 2 to each element in the vector x
>x+2
[1] 3 4 5 6 7
• You can also add one vector to another. To add the values 6:10
element‐wise to x, you do the following:
> x + 6:10
[1] 7 9 11 13 15
17
Exploring RGui
18
R Studio
• RStudio is a code editor and development environment with some
very nice features that make code development in R easy and fun:
– Code highlighting that gives different colors to keywords and
variables, making it easier to read
– Automatic bracket and parenthesis matching
– Code completion, so you don’t have to type out all commands
in full
– Easy access to R Help, with some nice features for exploring
functions and parameters of functions
– Easy exploration of variables and values
19
R Studio
• Source: The top‐left corner of the
screen contains a text editor that
lets you work with source script
files. Here, you can enter
multiple lines of code, save your
script file to disk, and perform
other tasks on your script.
• Console: In the bottom‐left
corner, you find the console. This
is where you do all the
interactive work with R.
20
R Studio
• Environment and History:
The top‐right corner is a
handy overview of your
environment, where you can
inspect the variables you
created in your session, as
well as their values.
• This is also the area where
you can see a history of the
commands you’ve issued in R.
21
R Studio
• Files, plots, package, help, and
viewer: In the bottom‐right
corner, you have access to several
tools:
Files: This is where you can browse the
folders and files on your computer.
Plots: This is where R displays your
plots (charts or graphs).
Packages: You can view a list of all
installed packages.
A package is a self‐contained set of code
that adds functionality to R, similar to
the way that add‐ins add functionality to
Microsoft Excel.
22
Get or Set Working Directory
• getwd returns an absolute
filepath representing the
current working directory
of the R process; setwd() is
used to set the working
directory.
Usage
• getwd() > setwd("C:/Users/user1/Desktop/R Proje")
• setwd()
23
Help
• The help() function and ?
help operator in R provide
access to the documentation
pages for R functions, data
sets, and other objects, both
for packages in the standard
R distribution and for
contributed packages.
• Exp:
– ?median
– help (median)
24
First Codes, Variables and Data Entry
• Data Entry
• > X <- 10
Scaler or vector
• > a <-c(10, 20, 30)
25
First Codes, Variables and Data Entry
• Vector
• ''c()"
• "combine"
26
First Codes, Variables and Data Entry
• With ()
• Or without ()
• "#"
x<-10
a<-c(10,20,30)
a<-c(10,20,30)#Herhangi bir çıktı oluşmaz
b<-2*a #Herhangi bir çıktı oluşmaz
(b<-2*a) #Değişken konsolda gösterilir
[1] 20 40 60
27
First Codes, Variables and Data Entry
• The data type of the
variables can be dynamic
(changeable).
• Determines the data type of
the variable dynamically
according to the type of data
we assign.
• The class () function is used X<-4.15
to see a variable’s type. class(x)
x <- "xyzt"
• The ls () function is used to class(x)
see a list of all variables.
28
First Codes, Variables and Data Entry
• rm () can be used to remove
objects.
• The value of the internal
evaluation of a top-level R
expression is always assigned
to .Last.value
> 45/5+sqrt(81)+exp(pi*log(10))
> t <- .Last.value
>t
29
Basic Arithmetic Operations
> 25 + 32 - 17 + 3
[1) 43
30
Basic Arithmetic Operations
• Prints its argument
print(exp(2), digits =2)
print ( ) [1] 7.4
print(exp(2), digits =3)
[1] 7.39
minimal number of significant
digits
round ()
round(pi, digits 2)
rounds the values in its first [1] 3.14
argument to the specified number round(pi, digits 3)
of decimal places (default 0). [1] 3.142
31
References:
• Arin Basu MD MPH, DataAnalytics, «Introduction to R»
http://www.pitt.edu/~super7/17011-18001/17641.ppt
• http://venus.ifca.unican.es/Rintro/dataStruct.html#vectors