0% found this document useful (0 votes)

166 views56 pages

Introduction To R PDF

The document provides an introduction to R and discusses several key concepts: - It describes what statistics is and some common terms like data, variables, populations, and samples. - It explains how to perform basic calculations and operations in R like arithmetic, functions, and generating random numbers. - It discusses various data types in R like numeric, character, and logical vectors as well as how to access and manipulate vector elements. - It introduces more advanced topics like data frames, reading external data files, sorting/ordering data, and sampling.

Uploaded by

Harshana Supun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

166 views56 pages

Introduction To R PDF

Uploaded by

Harshana Supun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Introduction to R

Statistics
Statistics is a branch of mathematics dealing with
 Data collection
 Organization
 Analysis
 Interpretation
 Make decisions
• Data consists of information coming from
observations, counts, measurements, or responses.

• A Variable is a characteristic or condition that can

change or take on different values.

• A population is the collection of all outcomes,

responses, measurement, or counts that are of
interest.

• A sample is a subset of a population.

Parameters & Statistics
• A parameter is a numerical description of a
population characteristic.

• A statistic is a numerical description of a sample

characteristic.
Example:
1. A recent survey of a sample of 450 college
students reported that the average weekly
income for students is $325.

2. The average weekly income for all students is

$405.
Branches of Statistics
Qualitative and Quantitative Data
• R is a language and environment for statistical
computing and graphics.

• R is available as a Free Software.

• Download R from the CRAN website

(Comprehensive R Archive Networks)

• The site: http://cran.r-project.org

• R provides a wide variety of statistical analysis

• Linear and Nonlinear modeling

• Classical statistical tests
• Time-series analysis
• Classification
• Clustering
• Big data Analysis
• Data mining
Interface of R studio
Create a new project
Working directory
To print the current working directory :
> getwd()
To change the current working directory :
> setwd(“H:”)

The workspace
Your R objects are stored in a workspace.
To list the objects in your workspace:
> ls()
History
• To Work with your previous commands:
>history() #display last 25 commands
>history(max.show=Inf) #display all previous
commands

• To save your command history default name is

“.Rhistory”.
>savehistory(file=“myfile”)

• To recall your command history:

>loadhistory(file=“myfile”)
Getting Help
• If you know a particular command, but don’t
know the correct syntax then: help(“command”)
Eg: >help(“ls”) or >?ls

• If you don’t know the command, but know the

keyword then:
Eg: > help.search(“ls”)
• If you want to list all the functions including a
particular word then:
apropos(“word”)
Eg: >apropos(“save”)

• If you want to get example about some

function
example(“word”)
Eg: >example(“save”)
As a Calculator
• One of the simplest possible tasks in R is to enter
an arithmetic expression and receive a result.
>2+2
[1] 4
>2*5
[1] 10
>sqrt(4)
[1] 2
>exp(-2)
[1] 0.13535353
> pi
[1] 3.141593

>(5+(6+7)*(pi^2))/8
[1] 16.66311

>log(exp(1))
[1] 1

>log(10000, 10)
[1] 4
> sin(pi/3)^2 + cos(pi/3)^2
[1] 1

>Sin(pi/3)^2 + cos(pi/3)^2
Error: couldn’t find function “Sin”

>ExP(-1)
Error: could not find function "ExP“

>exp(-1)
[1] 0.3678794
Naming Variables
• Three types of Variables
 Numeric {Ex: 3, 4.098, 1234}
 Character {Ex: Andrew, today, RRR}
 Logical{Ex: TRUE, FALSE}

• Names can be built from letters, digits, and the period (dot)(.)
symbol.
• Names must not start with a digit or a period followed by a
digit.
• Names are case-sensitive.
• Some names are already used by the system. You can’t use the
followings as variable names
Eg: c, q, t, D, F, I, T, diff, df, pt
Assigning values to variables

• “<-” used to indicate equal sign

> weight<-50 Or > weight=50
• To display the value,
>weight
[1] 5

• You cannot do much statistics on single numbers.

If we want to work out with more than one
number, the solution is VECTORS.
Vectors
• A vector is a sequence of data elements of the
same basic type.
• One advantage of R is that it can handle entire
data vectors as single objects.
> weight<-c(60,45,76,31,53)
> weight
[1] 60 45 76 31 53
• Eg: Enter the following numbers into Y, and
perform the following operations.
2, 4, 3, 6, 5, 1, 7, 8, 9, 10

i. a = Y + Y
ii. b = Y *(1/2)* Y
iii. c = a + b
iv. d = 1/c
v. Print Y, a, b, c, d
• Suppose you want to handle the 2nd element of
the Y.

> Y [2]
[1] 4

> Y [1:3]
[1] 2 4 3

>Y[5:8]
[1] 5 1 7 8
Character Vectors
• A character vector is a vector of text strings, whose
elements are specified and printed in Quotes.
> x = c (“Wednesday”, “Tuesday”, “Monday”)
>x
[1] “Wednesday” “Tuesday” “Monday”

> color <- c (“Red”, “Blue”, “Green”)

> color
[1] “Red” “Blue” “Green”

> color[2]
[1] “Blue”
Logical Vectors

• Logical vectors can take the value TRUE or FALSE

(or Not available, missing values)
• In input, you may use the convenient abbreviation
T or F.
> c (T, F, T, T, F, T, F) but this way of defining is
not common
Eg:> Y < 5

[1] TRUE TRUE TRUE FALSE FALSE TRUE FALSE

FALSE FALSE FALSE
Missing Values
• Missing values are represented by the symbol NA (Not
Available)
• Impossible values are represented by the symbol NaN
(Not A Number)
• Infinite values are represented by Inf
> x <- c (12, 54, NA)
>x+3
[1] 15 57 NA

> log (0)

[1] –Inf

> 0/0
[1] NaN
R as a Number Generator
• Generate a variable with numbers ranging from 1
to 12:
> x <- 1:12
>x
[1] 1 2 3 4 5 6 7 8 9 10 11 12

• Sequence – seq(from, to, by)

> seq(4, 6, 0.25)
[1] 4.00 4.25 4.50 4.75 5.00 5.25 5.50 5.75
6.00
> seq(from = 1, to = 30)
> seq(from = -5, to = 5, by = 0.2)
> seq(length = 51, by = 0.2, from = 5)
• Repetition – rep(x, times, …)
> rep(10, 3)
[1] 10 10 10

> rep(c(1:4), 3)
[1] 1 2 3 4 1 2 3 4 1 2 3 4

> rep(c(1.2, 2.7, 4.8), 5)

[1] 1.2 2.7 4.8 1.2 2.7 4.8 1.2 2.7 4.8 1.2
2.7 4.8 1.2 2.7 4.8
> s1=1:9
> s2 <- rep(s1, times = 3)
> s3 <- rep(s1, each= 2)
> s4 <- rep(s1, times = 3, each= 2)

• Generating levels – gl(n, k, length = n*k)

> gl(2, 4, 8)
[1] 1 1 1 1 2 2 2 2

> gl(2, 10, length =20)

[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
Levels: 1 2
Accessing Data
• There are several ways to extract data from a
vector.
Suppose x is the data vector, for example x = 1:10
• To find how many elements
> length(x)
• To print ith element (i = 2)
> x[2]
• To print all but not ith element (i = 2)
> x[-2]
• To print all but not specific elements (not 1st
to 3rd elements)
> x[-c(1:3)]

• To print first k elements (k = 5)

> x[1:5]

• To print specific elements (1st, 3rd and 5th)

> x[c(1,3,5)]
• To print all greater than some value (the value
is 3)
> x[x>3]

• To print which indices are largest

> which(x == max(x))
Sorting & ordering
• Suppose y is a data vector which contains the
values
12, 34, 6, 48, -3, -28

• To sort the values in y

> sort(y)
[1] -28 -3 6 12 34 48

• To order the values in y

> order(y)
[1] 6 5 3 1 2 4
R as a sampler
• We have 60 people (1,2,3,…,60). If we
randomly select 5 people from the group, who
would be selected?

> sample(1:60, 5)
[1] 32 26 6 18 9
Data frames

• We have to handle not only the variables and

vectors but also the data sets. When we work
with a data set in R, it is stored as a data frame.
Basically it can be used two methods to create
data frames.

1. Store all variables as separate vectors, and then

combine.

2. Read from files.

1. Store all variables as separate vectors, and
then combine
Syntax : >data.frame(variable_1,variable_2,.........)
Eg: High Medium Low
28 24 20
26 22 21
25 24 NA
29 28 25
33 30 26
38 36 32
41 40 35
36 33 32
• Create three variables High, Close and Low as
three separate vectors.
>High=c(28, 26,........,36)
>Medium=c(24, 22,.........,33)
>Low=c(20,21,NA,.......,32)
• Combine all those vectors by using the
command
>data.frame(High,Close,Low)
• If you want to access the created data set
again, you have to assign a name to that.
>data1=data.frame(High,Close,Low)

2. Read from files

• If we have large data sets it is more preferable
to read data from an external file rather than
entering data during the R session at the
keyboard. First we have to modify the input
file according to the requirements of R.
• To read an entire data frame directly, the external
file will normally have a special form.
• The first line of the file should have a name for
each variable in the data frame.
• All the missing values should replace by “NA”.
• By default numeric items (except row labels) are
read as numeric variables and non-numeric
variables as factors.
• The most convenient way of reading data into R is
via the function called
>read.table()
• It requires the data created with Windows’
Notepad or any other plaintext editor.
Sub1 Sub2
91 78
84 70
80 85
75 88
93 69
How to create the text file?
• To enter the data into file, you could start up
Windows’ Notepad and simply type the data as
shown.
• Columns are separated by an arbitrary number of
blanks. (Eg: single blank or tab space).
• NA represents a missing value.
• Save as a text file.
How to import into R?

Syntax
>data2=read.table(“H/marks.txt”, header=T)
Or
>data2=read.table(file.choose(), header=T)
• header=T columns have headings.

• Note that you use forward slashes(/), not back

slashes (\), in the file name.

• The back slash is itself is written \\ so we

could also have used

>data.frame.name=read.table(“Drive\\Directory\\FileName.extension”,

header=T)
Variations of read.table
1. read.csv
fields are separated by commas
2. Using History Window
Naming Columns
• It can be named columns after import the
data set into R.
Syntax:
>names(dataset_name) = c(“var_name1”,
“var_name2”, ............)
Eg:
>names(data)=c("Index","Weight","Height","S
ex","Sub1","Sub2","Sub3","Class")
To separate the data items into separate vectors
• Syntax: >variable_name = data_frame_name[column_no]
Eg: >Sub1=data2[1]

• Syntax:
>variable_name = data_frame_name$ variable_name_in_text_file
Eg: >Sub1=data2$Sub1
Descriptive Statistics
• It can be used some predefined functions to
perform some necessary statistics one by one.
• Syntax:
> mean(variable_name)
> sd(variable_name)
> var(variable_name)
> min(variable_name)
> max(variable_name)
> median(variable_name)
• Eg:
>mean(Height)
>var(Height)
• All these statistics can be performed at once
by using the function 'summary'.
• Syntax: > summary(variable_name)
• Eg: >summary(Height)
• If there is any missing value in the variable, R produce
the result as a missing value (NA).
• To avoid that problem, you can give the argument
'na.rm‘ (not available, remove) to request that missing
values to be removed.
• Syntax: > mean(variable_name, na.rm=T)
• Eg:
>mean(Sub1)
>mean(Sub1,na.rm=T)
How to use a by variable?
1st method
• Consider about each levels of given category
• Syntax: > tapply(association_var, classification_var,statistic)
association_variable - any continuous variable
classification_variable - any categorical
variable (by variable)
statistic - any statistic that you want to perform
• Eg:
> tapply(Height,Sex,mean)
2nd method
Consider only the given level of given category
• Syntax:
> summary(association_var [classification_var = =level])
• Instead of 'summary', any predefined function for
descriptive statistics can also be used here.
Eg:
> summary(Height [Sex = ='M'])
> mean(Height [Sex = ='M'])
> var(Height [Sex = ='M'])
Tally and Contingency Tables
• Table uses the cross-classifying factors to build a
contingency table of the counts at each combination of
factor levels.

Tally table

• Syntax: > table(var_name)

• Var_name should be a Categorical Variable.

Eg: >table(Sex)
Contingency Tables
• Syntax:
> table(var_name1, var_name2)

Eg:
>table(Sex,Class)

R-Programming Notes
100% (1)
R-Programming Notes
33 pages
Ms Windows
50% (4)
Ms Windows
47 pages
Statistical Computing II-slide
No ratings yet
Statistical Computing II-slide
279 pages
PCNSE Study Guide
100% (1)
PCNSE Study Guide
133 pages
Cake Shop MS
100% (5)
Cake Shop MS
22 pages
JCL Interview Questions
No ratings yet
JCL Interview Questions
4 pages
OV2500 NMS-E 4 2 1 R01 User Guide
No ratings yet
OV2500 NMS-E 4 2 1 R01 User Guide
418 pages
Command and Control With HTTP Shell Using JSRat
No ratings yet
Command and Control With HTTP Shell Using JSRat
5 pages
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
No ratings yet
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
36 pages
STATS LAB Basics of R PDF
No ratings yet
STATS LAB Basics of R PDF
77 pages
An Introduction To R: Biostatistics 615/815
No ratings yet
An Introduction To R: Biostatistics 615/815
59 pages
Getting Started With R
No ratings yet
Getting Started With R
155 pages
CH 03
No ratings yet
CH 03
42 pages
NetWorker 9.0 Performance Optimization Planning Guide
No ratings yet
NetWorker 9.0 Performance Optimization Planning Guide
80 pages
RMA Process Overview
No ratings yet
RMA Process Overview
29 pages
R Studio
No ratings yet
R Studio
41 pages
Data Layer For Digital Marketers
No ratings yet
Data Layer For Digital Marketers
29 pages
Online Shopping Mall
No ratings yet
Online Shopping Mall
17 pages
LCD 4 Bits
100% (1)
LCD 4 Bits
5 pages
In R Programming PDF
No ratings yet
In R Programming PDF
72 pages
Basic Statistics
No ratings yet
Basic Statistics
66 pages
Introduction To R: 1 Getting Started
No ratings yet
Introduction To R: 1 Getting Started
14 pages
R Pres
No ratings yet
R Pres
53 pages
WINSEM2021-22 MAT2001 ELA VL2021220501462 Reference Material I 04-01-2022 1. Introduction of R Language - I
No ratings yet
WINSEM2021-22 MAT2001 ELA VL2021220501462 Reference Material I 04-01-2022 1. Introduction of R Language - I
15 pages
Introduction To Rlogistic
No ratings yet
Introduction To Rlogistic
135 pages
R Statistical Package
No ratings yet
R Statistical Package
63 pages
Rbasics
No ratings yet
Rbasics
96 pages
R Lab
No ratings yet
R Lab
114 pages
Rintro
No ratings yet
Rintro
14 pages
R Course ISLR Basics 2023
No ratings yet
R Course ISLR Basics 2023
77 pages
File List
No ratings yet
File List
18 pages
MDPN460 Lecture03
No ratings yet
MDPN460 Lecture03
34 pages
Mb3g1it - Model QP & Ans
No ratings yet
Mb3g1it - Model QP & Ans
19 pages
R Programming Slides
No ratings yet
R Programming Slides
73 pages
Da Session 4
No ratings yet
Da Session 4
75 pages
2 Undefined
No ratings yet
2 Undefined
86 pages
R Session A
No ratings yet
R Session A
107 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
Network Analysis and Visualization With R and Igraph
No ratings yet
Network Analysis and Visualization With R and Igraph
62 pages
My SQL in Java
No ratings yet
My SQL in Java
13 pages
R-Tutorial - Introduction
No ratings yet
R-Tutorial - Introduction
30 pages
R-Basic Concepts
No ratings yet
R-Basic Concepts
67 pages
Project Management
No ratings yet
Project Management
3 pages
Lecture 1
No ratings yet
Lecture 1
42 pages
CICS High Availability Overview
No ratings yet
CICS High Availability Overview
126 pages
R Programming Lab
No ratings yet
R Programming Lab
33 pages
MIS 4.hafta (Introduction To R)
No ratings yet
MIS 4.hafta (Introduction To R)
52 pages
Unit 4
No ratings yet
Unit 4
27 pages
Untitled
No ratings yet
Untitled
59 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Data Anlytics Using R Notes
No ratings yet
Data Anlytics Using R Notes
14 pages
Assignment 2 Stability, Steady State Error, PID
No ratings yet
Assignment 2 Stability, Steady State Error, PID
9 pages
Introduction To R Chap 2
No ratings yet
Introduction To R Chap 2
30 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
STAT 04 Simplify Notes
No ratings yet
STAT 04 Simplify Notes
34 pages
Data Analysis Using R and Vectors
No ratings yet
Data Analysis Using R and Vectors
35 pages
RBigData NTL
No ratings yet
RBigData NTL
24 pages
Introduction To Analytics and R File
No ratings yet
Introduction To Analytics and R File
29 pages
Introduction To R
No ratings yet
Introduction To R
23 pages
Running Domain Controllers in Hyper-V
No ratings yet
Running Domain Controllers in Hyper-V
31 pages
R Unit1
No ratings yet
R Unit1
26 pages
About R Language
No ratings yet
About R Language
15 pages
R Programming
No ratings yet
R Programming
22 pages
Introduction To R
No ratings yet
Introduction To R
39 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
R
No ratings yet
R
13 pages
R Software - Notes
No ratings yet
R Software - Notes
18 pages
Unit 2 Notes - Data Analysis Using R
No ratings yet
Unit 2 Notes - Data Analysis Using R
19 pages
Part I: Introductory Materials: Introduction To R
No ratings yet
Part I: Introductory Materials: Introduction To R
25 pages
About R Language: Installation
No ratings yet
About R Language: Installation
7 pages
R Study Material I
No ratings yet
R Study Material I
8 pages
S24 Stats10 Lab1-1
No ratings yet
S24 Stats10 Lab1-1
8 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
RG8000 Installation and Support With Sparrow
No ratings yet
RG8000 Installation and Support With Sparrow
1 page
Computer Programming
No ratings yet
Computer Programming
10 pages
R Cheatsheet Base R
No ratings yet
R Cheatsheet Base R
2 pages
Virus & Worms
No ratings yet
Virus & Worms
6 pages
BLEED AI Outline Classical Vision
No ratings yet
BLEED AI Outline Classical Vision
26 pages
Assignment Report #4: Obstacle Avoidance Algorithm
No ratings yet
Assignment Report #4: Obstacle Avoidance Algorithm
12 pages
Virtual Base Class
No ratings yet
Virtual Base Class
3 pages
Ravi Led Hitec
No ratings yet
Ravi Led Hitec
5 pages
DB Project Document
No ratings yet
DB Project Document
23 pages
GPS Based Bus Managemnt System
No ratings yet
GPS Based Bus Managemnt System
14 pages
ATG CA Versioning Training
No ratings yet
ATG CA Versioning Training
11 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet

Introduction To R PDF

Uploaded by

Introduction To R PDF

Uploaded by

Introduction to R

• A Variable is a characteristic or condition that can

• A population is the collection of all outcomes,

• A sample is a subset of a population.

• A statistic is a numerical description of a sample

2. The average weekly income for all students is

• R is available as a Free Software.

• Download R from the CRAN website

• The site: http://cran.r-project.org

• Linear and Nonlinear modeling

• To save your command history default name is

• To recall your command history:

• If you don’t know the command, but know the

• If you want to get example about some

• “<-” used to indicate equal sign

• You cannot do much statistics on single numbers.

> color <- c (“Red”, “Blue”, “Green”)

• Logical vectors can take the value TRUE or FALSE

[1] TRUE TRUE TRUE FALSE FALSE TRUE FALSE

> log (0)

• Sequence – seq(from, to, by)

> rep(c(1.2, 2.7, 4.8), 5)

• Generating levels – gl(n, k, length = n*k)

> gl(2, 10, length =20)

• To print first k elements (k = 5)

• To print specific elements (1st, 3rd and 5th)

• To print which indices are largest

• To sort the values in y

• To order the values in y

• We have to handle not only the variables and

1. Store all variables as separate vectors, and then

2. Read from files.

2. Read from files

• Note that you use forward slashes(/), not back

• The back slash is itself is written \\ so we

• Syntax: > table(var_name)

• Var_name should be a Categorical Variable.

You might also like