0% found this document useful (0 votes)
16 views52 pages

D1 2 Intro R

Uploaded by

marcelkatulumba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views52 pages

D1 2 Intro R

Uploaded by

marcelkatulumba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

INTRODUCTION TO R

TUTORS AIMS-GHANA 01 & 03 NOV , 2023


OUTLINE

01 02 03
SESSION SET-UP WHY R GETTING R
R popularity, R vs. others, R How to install, R & RStudio, How to
What to expect, session
capacities, pros and cons run R programs,
objectives, planned activities

04 05 06
INTRO TO CORE R SELF LEARNING END OF DAY
FEATURES EXERCISE
How to keep learning R on your own
Key language features
SESSION OBJECTIVES Expected Learning Outcomes

At the end of the session, learners should be able to:

01 PERFOM BASIC COMPUTATIONS IN R

Create variables, compute with variables and more

02 WORK WITH ESSENTIAL DATA TYPES IN R

Numeric, Strings, Lists, Vectors, Data Frames

03 USE R’S CONTROL FLOWS

If statements, for and while loop, break and continue

04 WRITE FUNCTIONS , CREATE MODULES AND PACKAGES

Write functions, create modules and package python code

05 HANDLE FILES

Read from files, write to files and use module to retrieve files

06 DEBUGG CODE AND HANDLE ERRORS AND EXCEPTIONS


Session Setup
How We will Do it

For each topic

Introductory presentation Follow along coding Exercise and practice


Provide introduction to key concepts See how each concept works using toy Tackle more realistic problems
examples

4 hrs. 2 hrs.
5

WHY R?

7/18/2017 Big Data Program-Lighting Talks


About R
Object
Oriented
• R is an interpreted, object-oriented, high-
level programming language with dynamic
semantics

• R is a programming language and free


software for statistics and data science Procedure High-level
Oriented language
• R was created by Ross Ihaka et Robert
Gentleman in 1993

• R is very easy to learn

Easy to lean
Who Uses R?
Some of the Well Known Companies Who Use R

Some Well Known Companies Using R


The R advantage

1. Popular with large user community and open libraries

2. Free /open source

3. A general-purpose programming language

4. Easy to learn

Popularity of R has been increasing steadily


R Vs. STATA/SPSS

STATA, SPSS $ Price

User community

Sharing of code

Specialized stats and econ


modules

Technical support

Machine Learning

Large and
unstructured data
Key R Features
How easy is R?

Simple, Easy and Concise


• R code is easy to read and write
R
• The language is easy to learn

Free and Open Source


R is an example of a FLOSS (Free/Libre and Open Source
Software) which means one can freely distribute copies of
STATA
this software, read it's source code, modify it, etc.

Portable
Supported by many platforms like Linux, Windows,
FreeBSD, Macintosh Python
How to Get and Install R
Install essential Use R
Install R for your OS
Libraries/Packages

1.Directly from : R is powered by thousands of


https://www.r-project.org/ user contributed and freely
available packages which can be
2. Rstudio: installed easily using command :
https://posit.co/download/rstu
dio-desktop/ Install.packages(“package_name”)

3. Other third party platforms


R 4.3.0 Vs. R 4.2.3
• R 4.3.: R 4.3.x

• Why are they 2 versions: R 4.2 is legacy, so R


4.3 is the future.

• There are syntactical differences between the


two.

Rstudio can be installed with either R 4.2 or 3. We will use R 4.3. in this course.
R Development ENvironments
Development Environment: a place where you write and/or run R program

Interactive R Interpreter on the Interactive Development


command line Environment (IDEs)

1. Terminal/Shell-Command line 1. IDLE – default for R

2. Sypder (similar to R studio)

3. Rstudio /R Notebook

4. Jupyter Notebook
HELLO WORLD!
R first program in different coding environments.

R Program Environments

1. Basic Terminal

print(“Hello World”)
2. Sypder (similar to R studio)

3. Rstudio /R Notebook

4. Jupyter Notebook
15

A quick Introduction to Rstudio

7/18/2017 Big Data Program-Lighting Talks


Installing and Using Rstudio App
The RStudio App doesn’t come with R, you have to install it- we will see how to install later

Run the codes


Environment
for viewing
objects
Scripts/Cod
e

Console
Install
Packages/Vi
sualization
What is RStudio?
A graphical interface application in which you can create and share documents that contain live
codes, equations, visualizations and text.

Basic Features
• The RStudio application produces documents that we
call "scripts", "R Notebooks" or "R markdown
documents", which contain both code and rich text
elements, such as figures, links, equations, etc.

• RStudio is a free, open-source, cross-platform development


environment for R, a programming language used for data
processing and statistical analysis.

• The RStudio application is a server-client application


that can be run on a PC without Internet access, or
installed on a remote server, which you can access via
the Internet.

• The RStudio application allows you to edit and run your


notebooks via a web browser.
What to Know about Rstudio App/Notebook
We will quickly run through the following concepts which are useful for this course.
• Launching the App

• File management stuff


• Opening an existing scripts, creating a new,
Saving/renaming/duplicating/deleting a file
• Saving notebook
• Exporting notebook

• Cells:
• Creating, inserting, deleting and moving cells
• Execute a cell, stop execution, execute many cells, etc
• Code cells vs text cells
• Knowing when a cell is executing

• Kernels
• Interrupt/stop/restart/shutdown kernel
• Change kernel
• Tips
• Getting help
• Tab completion
• Shortcuts
Exercise: Familiarize Yourself with the Rstudio
Time: 5 minutes

For new users

1. Launch RStudio

2. Create a new file (Rscript) using RStudio and save


it as ‘Exercice_1’.

3. Create a new code cell/line and do a math


calculation : “5 + 5”. Run the cell/line

4. Try this time consuming computation :


a=seq(10000000) for(i in a){print(i)}. What’s
happening? Stop the execution.
A Quick run Through R core Features

01 OPERATORS
Assignment, arithmetic and more

02 VARIABLES AND DATA TYPES


Strings, Numeric, container and more

03 CONTROL FLOW
Conditional statements, and loops

04 FUNCTIONS, MODULES AND PACKAGES


Defining and calling functions, creating modules

05 FILE HANDLING
Reading, writing files, using the module

06 ERROR HANDLING AND DEBUGGING


Type of errors, handling errors,
debugging
General R Features
Comments
Start comments with # - the rest of line is ignored.
•Can include a “documentation string” as the first line of any new
function or class that you define.

Variable Naming Rules


• Names are case sensitive and cannot start with a number. They can
contain letters, numbers, and underscores.

• Reserved words: sum, for, if, paste, continue, class, etc.


22

R Operators

7/18/2017 Big Data Program-Lighting Talks


Arithmetic Operators
Operator Exemple
ADDITION (+)
Add two operands or unary plus

SUBTRACTION (-)
Subtracts two operands

MULTIPLICATION (*)
Multiplies two operands

DIVISION (/)
Divide left operand with the right and result is
in numeric
EXPONENTION (**)
Left operand raised to the power of
right
REMAINDER (%%)
Remainder of the division of the left operand by the right
Assignment Operators

•The first assignment to a variable


creates it.
• Variable types don’t need to
be declared.
• R figures out the variable
types on its own.

• Assignment uses =
Comparison Operators

Operator Exemple

GREATER (>)
True if left operand is greater than right

LESS THAN (<)


True if left operand is less than right

EQUAL TO (==)
Tue if left operand is equal
to right
NOT EQUAL TO ( !=)
True if left operand is not equal
to right
Logical Operators
Operators & and | have expected definitions

&
If both the operands are true then
condition becomes true.

|
True if left operand is less than right.
27

ENOUGH TALK LETS


CODE

7/18/2017 Big Data Program-Lighting Talks


28

R Data Types

7/18/2017 Big Data Program-Lighting Talks


Overview of R Data Types

Data Type

Immutable Mutable

Numeric, Character/ Logical/boo Vectors, Lists Data


Integer, Strings lean Matrices Frames
Complex Arrays
What You need to Know About Data types
• How to create them. [Almost all]

• How to convert between types. [Almost all]

• How to modify them (if mutable) [List, dicts, etc]

• How to index, slice, etc. [Strings, Lists, Arrays, etc]

• How to loop through them. [Lists, string, dicts, data frame]

• How to use them with data science libraries (e.g., readxl) [Lists, dicts, data
frame, arrays, etc.]

• How to add/remove/replace items. [Lists, strings, dicts et autres types


mutables]

• How to check memership. [Lists, dicts, etc.]

• How to count number of items. [Lists, dicts, strings]


A Few Notes About Types in R

The class() can be used to check a


variable/object data type

Although R does some data type conversion internally (e.g., converts numbers from
strings) sometimes, you need to coerce a data type explicitly from one type to another
to satisfy the requirements of an operator or function parameter.
Numbers in R
Number data types store numeric values. They are immutable data types, means that changing
the value of a number data type results in a newly allocated object.

Common Numerical Types in R Number Operations Number Type Conversions


• Numeric: numerics are positive or
negative whole numbers with or
without a decimal point.
• Integer: This is used when you are
certain that you will never create a •Type numeric(x) to
variable that should contain decimals. convert x to a plain
To create an integer variable, you complex
must use the letter L after the integer
value. •Type complex(x) to
• Complex: A complex number is covert to a long integer
written with an "i" as the imaginary
part. •Type integer(x) to
convert to a numeric.
Strings
Strings are simple form of R sequence. They are stored as a collection of individual characters.

Common string Operations


Creating variables:
• single(’) or double (") quotes
• nchar(string) : to find the number of
Assign a String to a Variable: characters in a string
• grepl(string) –to check if a character or
• Assigning a string to a variable is a sequence of characters are present
done with the variable followed in a string
by the <- operator and the string
• paste(string): to merge/concatenate
two strings.
Vectors
A vector contains a collection of items that are of the same type. They are logical, integer, double, complex,
string, and raw. Even when you write just one value in R, it becomes a vector of length 1 and belongs to one of
the above vector types.

Common Operations on Vectors


How to Create Them

Common ways: Operation Result


d[i] Subscripting (1 based; negative subscripts drop the items from
d; TRUE or FALSE or 0 or 1 subscripts can be also used).
d = c(“item1”, “item2”, “item3”)
d=c(vec1, vec2) The concatenation of vec1 and vec2.
sum(d) Return the sum of all the values present in d
d = c() –creates empty vector
prod(d) Returns the product od all the values present in d
length(d) Returns the length of d
sort(d) Sort (or order) d into ascending or descending order.

In the table d is vector

A vector is a very useful data structure when handling data in R, so make sure you are comfortable
with it. Also in statistics, you are more likely to need vectors of numerical values.
Mathematical functions for vectors
In R, there are a large number of predefined mathematical functions that areuseful in practice. Here's a list of
functions for vectors:

Some of them Exercise

• order().
• mean()
Try to understand each of the
• var() above functions. To do this, run
• min() RStudio, define some toy vectors,
• max() apply the above functions, and try
• which.min()
• which.max() to understand the output. If you
• sd() can't understand a function,
• median() consult the help function using the
• rep()
command help(“function_name”)
• seq()
or ?function_name
Matrices
A matrice is One of the mutable sequence types. It is the R object in which the elements are arranged in a two-
dimensional rectangular layout. A column is a vertical representation of data, while a row is a horizontal
representation of data

How to Create Them Some Examples

The basic syntax:

d = matrix(data, nrow, ncol, byrow, dimnames))

d = matrix() –creates empty matrix

Remember the c() function is used to concatenate items together.


Data Frame
A data frame is a table or a two-dimensional array-like structure in which each column contains values of one
variable and each row contains one set of values from each column.

How to Create it Some Examples

The basic syntax:

d =data.frame()

Remember that Data Frames can have different types of data inside it. While the first column can be
character, the second and third can be numeric or logical. However, each column should have the
same type of data..
38

ENOUGH TALK LETS


CODE

7/18/2017 Big Data Program-Lighting Talks


39

R Control Flow

7/18/2017 Big Data Program-Lighting Talks


Control Flow
In programming, control flow is the order function calls, instructions,
and statements are executed or evaluated when a program is running

else…if if If…else

break R Control Flow next

for repeat while


If Statements
If If …else
If (condition 1){
- If (condition) statement 1 ….
{statement 1 ….} } else if(condition 2){
-If (condition) statement 2 ….
{statement 1 ….} }
else:
{statement 2….} else {statement 3 ….}:

Notes about if statements


• The if is used to specify a block of code to be executed if a condition is TRUE
• The else keyword catches anything which isn't caught by the preceding
conditions.
• The else if is the R's way of saying "if the previous conditions were not true,
then try this condition"
The for Loop in R
A for loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or
a string)

for loop syntax

for (var)
{do something with var …}

Notes about for loop


• var represents a copy of each
element of sequence in turn, and is
local to the for loop

• The for loop does not require an


indexing variable to set beforehand
Example
The while Loop in R
With the while loop we can execute a set of statements as long as a condition is true.

while loop
syntax
while (var)
{do something with var …}

Notes about while


loop

• The while loop is useful when you


need to iterate over a non-sequence

• Its important to define relevant


variables to enable termination of Example
the loop
44

Functions in R

7/18/2017 Big Data Program-Lighting Talks


Overview of Functions
A function is a block of organized, reusable sets of instructions that is used to perform some related actions.

Why Functions? Function Categories

• Re-usability of code minimizes


redundancy R Functions
• Procedural decomposition makes things
organized

• Reduces errors

• Makes debugging easy Built-in User defined


Function
A function is a set of statements organized together to perform a specific task. As we see previously, R has a
large number of in-built functions and the user can create their own functions. An R function is created by
using the keyword function().

How to Create it Some Examples

The basic syntax:

- function(item1, item2,…){
function body
}

- function(){function body} : function without an argument

Information can be passed into functions as arguments. Arguments are specified after the function
name, inside the parentheses. You can add as many arguments as you want, just separate them with
a comma.
47

File Handling in Python

7/18/2017 Big Data Program-Lighting Talks


File Operations in Python
What You Need to Know Some Examples

1. Open files in different modes

2. Read files

3. Write to files

4. Handle errors which often happen


when reading and writing files
49

Errors, Exceptions and Debugging

7/18/2017 Big Data Program-Lighting Talks


Types of Errors
Syntax Errors/Parsing Errors
• the most common kind of complaint
you get while you are still learning
Python
• The error is caused by (or at least
detected at) the token preceding the
arrow (see example on the right)
• Line number is also provided, so use it

Exceptions
• Even if a statement or expression is
syntactically correct, it may cause an
error when an attempt is made to
execute it. Errors detected during
execution are called exceptions
• The last line of the error message
indicates what happened
• Exceptions come in different types, and
the type is printed as part of the
message
Remember, When You Get Stuck, You Can Always Ask Google
Resources for Learning R
Self Learning Crucial in Programming
1. Just Do It: Start using R for some of you
day to day data tasks

2. Take structured courses: Coursera, Edx,


Datacamp and more

3. Use R in projects (even if they are just toy


projects)

4. Read R Books (e.g., automate the boring


stuff with R)

5. Always use Stackoverflow

6. Attend conferences, meetups etc.

7. Start a R community

You might also like