0% found this document useful (0 votes)
35 views6 pages

String R

Uploaded by

RAHUL SHARMA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views6 pages

String R

Uploaded by

RAHUL SHARMA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Faculty of : FCE Program: B.Tech Class/Section: Sem V, Sec.

A,B,C(AIDS) Date:

Name of Faculty: Seema Kaloria Name of Course: R Programming Code: BADCCE5104

R Strings

 Strings are a bunch of character variables.


 One or more characters enclosed in a pair of matching single or double quotes can be
considered a string in R.

Creation of String in R

R Strings can be created by assigning character values to a variable. These strings can be further
concatenated by using various functions and methods to form a big string.
Example

# creating a string with double quotes


str1 <- "OK1"
cat ("String 1 is : ", str1)

# creating a string with single quotes


str2 <- 'OK2'
cat ("String 2 is : ", str2)

str3 <- "This is 'acceptable and 'allowed' in R"


cat ("String 3 is : ", str3)

str4 <- 'Hi, Wondering "if this "works"'


cat ("String 4 is : ", str4)

str5 <- 'hi, ' this is not allowed'


cat ("String 5 is : ", str5)

1
An Overview of String-Manipulation Functions

nchar(): The call nchar(x) finds the length of a string x.

> nchar("South Pole") # Output: 10

The string "South Pole" was found to have 10 characters. C programmers, take note: There is no
NULL character terminating R strings. Note that the results of nchar() will be unpredictable if x
is not in character mode.

2
grep(): The call grep(pattern,x) searches for a specified substring pattern in a vector x of strings.

If x has n elements—that is, it contains n strings—then grep(pattern,x) will return a vector of


length up to n. Each element of this vector will be the index in x at which a match of pattern as a
substring of x[i]) was found.

> grep("Pole", c("Equator", "North Pole", "South Pole")) # Output: 2 3


> grep("pole", c("Equator", "North Pole", "South Pole")) # Output: integer(0)

Explanation: In the first case, the string "Pole" was found in elements 2 and 3 of the second
argument, hence the output (2,3). In the second case, string "pole" was not found anywhere, so
an empty vector was returned.

paste(): The call paste(...) concatenates several strings, returning the result in one long string.

> paste("North", "Pole") # Output: "North Pole"


> paste("North", "Pole", sep="") # Output: "NorthPole"
> paste("North", "Pole", sep=".") # Output: "North.Pole"
> paste("North", "and", "South", "Poles") # Output: "North and South Poles"

Note: the optional argument sep can be used to put something other than a space between the
pieces being spliced together. If you specify sep as an empty string, the pieces won’t have any
character between them.

sprintf(): The call sprintf(...) assembles a string from parts in a formatted manner.

> i <- 8
> s <- sprintf("the square of %d is %d", i, i^2)
>s # Output: "the square of 8 is 64"
Explanation: The name of the function is intended to evoke string print for “printing” to a string
rather than to the screen. Here, we are printing to the string s.
What are we printing? The function says to first print “the square of” and then print the decimal
value of i.

substr(): The call substr(x,start,stop) returns the substring in the given character position range
start:stop in the given string x.

> substring("Equator",3,5) # Output: "uat"

strsplit(): The call strsplit(x,split) splits a string x into an R list of substrings based on another
string split in x.
> strsplit("6-16-2011",split="-") # Output: "6" "16" "2011"

3
regexpr(): The call regexpr(pattern, text) finds the character position of the first instance of
pattern within text, as in this example:
> regexpr("uat", "Equator") # Output: 3

gregexpr(): The call gregexpr(pattern, text) is the same as regexpr(), but it finds all instances of
pattern.
> gregexpr("iss","Mississippi") # Output: 2 5

Explanation: This finds that “iss” appears twice in “Mississippi,” starting at character positions
2 and 5.

String Replacement:
sub(): Replace the first occurrences of a pattern in a string.
gsub(): Replace all occurrences of a pattern in a string.

x <- "apple banana banana"


sub("banana", "orange", x) # Returns "apple orange banana"
gsub("banana", "orange", x) # Returns "apple orange orange "

String Trimming:
trimws(): Remove leading and trailing whitespaces.

x <- " Hello "


trimws(x) # Returns "Hello"
These are just a few examples of string-related functions in R. The base R package provides a
rich set of functions for working with strings, and there are also additional packages like stringr
and tidyverse that offer more advanced string manipulation functions.

String Slicing
str <- "Learn Code"
len <- nchar(str) # counts the number of characters of str = 10
print(substr(str, 1, 4))
print(substr(str, len-2, len))

Case Conversion
 toupper() which converts all the characters to upper case,
 tolower() which converts all the characters to lower case, and
 casefold(…, upper=TRUE/FALSE) which converts on the basis of the value specified
to the upper argument. All these functions can take in as arguments multiple strings too.

4
Regular Expressions
When dealing with string-manipulation functions in programming languages, the
notion of regular expressions sometimes arises. In R, we must pay attention to this
point when using the string functions grep(), grepl(), regexpr(), gregexpr(), sub(),
gsub(), and strsplit().
A regular expression is a kind of wild card. It’s shorthand to specify broad classes of
strings. For example, the expression "[au]" refers to any string that contains either of
the letters a or u.
> grep("[au]", c("Equator", "North Pole", "South Pole")) # Output: 1 3
This reports that elements 1 and 3 of ("Equator", "North Pole", "South Pole")—that
is, “Equator” and “South Pole”—contain either an a or a u.
A period (.) represents any single character. Here’s an example of using it:
> grep("o.e", c("Equator", "North Pole", "South Pole")) # Output: 2 3

Explanation: This searches for three-character strings in which an o is followed by


any single character, which is in turn followed by an e. Here is an example of the use
of two periods to represent any pair of characters:

> grep("N..t", c("Equator", "North Pole", "South Pole")) # Output: 2

Explanation: Here, we searched for four-letter strings consisting of an N, followed


by any pair of characters, followed by a t.
A period is an example of a metacharacter, which is a character that is not to be taken
literally. For example, if a period appears in the first argument of grep(), it doesn’t
actually mean a period; it means any character.

But what if you want to search for a period using grep()? Here’s the naive approach:

> grep(".", c("abc", "de", "f.g")) # Output: 1 2 3

The result should have been 3, not (1, 2, 3). This call failed because periods are
metacharacters. W need to escape the metacharacter nature of the period, which is
done via a backslash:

> grep("\\.", c("abc", "de", "f.g")) # Output: 3

5
Now, didn’t I say a backslash? Then why are there two? Well, the sad truth is that the
backslash itself must be escaped, which is accomplished by its own backslash!

You might also like