0% found this document useful (0 votes)
3 views17 pages

Text Data and Analysis (Slides)

The document provides an overview of various spreadsheet functions for manipulating and analyzing text data, including functions like SUBSTITUTE, TRIM, CLEAN, SEARCH, FIND, SPLIT, CONCATENATE, UPPER, LOWER, and PROPER. It highlights the advantages of using these functions, such as time-saving, consistency, flexibility, and increased productivity. Additionally, the document includes examples of how to apply these functions using a dataset of Tweets on climate change.

Uploaded by

Ssaed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views17 pages

Text Data and Analysis (Slides)

The document provides an overview of various spreadsheet functions for manipulating and analyzing text data, including functions like SUBSTITUTE, TRIM, CLEAN, SEARCH, FIND, SPLIT, CONCATENATE, UPPER, LOWER, and PROPER. It highlights the advantages of using these functions, such as time-saving, consistency, flexibility, and increased productivity. Additionally, the document includes examples of how to apply these functions using a dataset of Tweets on climate change.

Uploaded by

Ssaed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Spreadsheet functions

Text data and analysis


Please do not copy without permission. © ALX 2024.
Spreadsheet functions

Overview
01. Introduction 07. The CONCATENATE function

The UPPER, LOWER, and PROPER


02. Data overview 08.
functions

03. The SUBSTITUTE function

04. The TRIM and CLEAN functions

05. The SEARCH and FIND functions

06. The SPLIT function


2
Spreadsheet functions

Introduction

|
Text data and analysis functions in spreadsheets refer to a set of built-in functions that allows
users to manipulate and analyze text data within cells. Advantages of using these functions
include:

01.
01. Time-saving 04.
01. Flexibility
They allow quick and easy manipulation of large
They allow users to extract specific information or
datasets, which saves time compared to manually
manipulate text data in various ways, depending on their
editing each cell.
needs. This flexibility allows users to create customized
02.
01. Consistency solutions for different types of data.

They ensure consistency in formatting and cleaning up


05.
01. Increased productivity
text data, which reduces errors and increases accuracy
and reliability.
The ability to manipulate and analyze text data within a
spreadsheet increases productivity, as users can quickly
03.
01. Scalability extract relevant information without having to switch
between multiple programs.
They can be used on large datasets, making it easier to
analyze and visualize the data.

3
Spreadsheet functions

Data overview

| To investigate how spreadsheet functions can be used to analyze text data, we will use a
Tweets on climate change dataset that has 100 rows and the following columns:

1. ID
Dataset
A numeric string that is associated with and
uniquely identifies a single Tweet within the
dataset. It makes it possible to access and
interact with a specific Tweet.

2. Text
An aggregated Tweet pertaining to climate
change.

4
Spreadsheet functions

The SUBSTITUTE function

|
The SUBSTITUTE function is used to replace a SUBSTITUTE and REGEXREPLACE are similar but
specific character or string of characters in a SUBSTITUTE is preferred over REGEXREPLACE when
cell with a different character or string. the text being replaced is in multiple columns.

=SUBSTITUTE(text_to_search, search_for, replace_with, [occurrence_number])

● text_to_search – The text within which to ● replace_with – The string that will replace
search and replace. search_for.

● search_for – The string to search for within ● occurrence_number – [OPTIONAL] The instance
text_to_search. of search_for within text_to_search to
search_for will match parts of words as well replace with replace_with. If
as whole words; therefore, a search for "vent" occurrence_number is specified, only the
will also replace text within "eventual". indicated instance of search_for is replaced.

5
Spreadsheet functions

The SUBSTITUTE function


.Example use:.
Remove all URLs and mentions from all the Tweets.

● We will use OR logic since a Tweet can have On Twitter, a mention is a way to tag or reference
both, either of the two, or neither of the two. another user in a Tweet by including their
username in the tweet. Mentions are commonly
● The REGEXEXTRACT function will be used to
used to start a conversation with someone, to
identify and extract the URLs and mentions
acknowledge someone in a Tweet, or to give
then the SUBSTITUTE function will be used to
credit to someone for their work.
replace them with a blank string.
Mentions are prepended with the “@” symbol.

Recall that the pipe symbol (|) represents the OR


operator in regular expressions.

6
Spreadsheet functions

The SUBSTITUTE function


.Example use:.
● The regex expression to match URLs is https/?:\/\/[^\s/$.?#].[^\s]* while that of mentions is
\B@\w{1,15}.

● To remove the URLs and mentions extracted by the REGEXREPLACE function, we will make replace_with
on SUBSTITUTE an empty string (“”).

Enter =SUBSTITUTE(B2, (REGEXEXTRACT(B2,"\B@\w{1,15}|https?:\/\/[^\s/$.?#].[^\s]*")),


01.
"") on cell C2.

02. Replicate the formula to the other rows by dragging the fill handle down.

ID Text URLs removed

1028954403129184256 Gotta love the facts. https://t.co/bZ2G8AZuo9 Gotta love the facts.

1028954810781814784 You send me crap It's 5 minutes to midnight You send me crap It's 5 minutes to
for a mute https://t.co/FFhYHCitKb midnight for a mute

7
Spreadsheet functions

The TRIM and CLEAN functions

| The TRIM function is used to remove leading, trailing, and repeated spaces in text while the
CLEAN function returns the text with the non-printable ASCII characters removed.

=TRIM(text) For example:


=CLEAN(text)
● =TRIM(" Hello, World! ") -> “Hello, World!”
● text – The string or reference to a cell ○ Removing leading and trailing spaces.
containing a string to be trimmed or the text ● =CLEAN("Hello World!") -> “HelloWorld”
whose non-printable characters are to be ○ Removing tab character between “Hello”
removed. and “World”.

Spreadsheets do not show non-printable characters in the user interface, so using the CLEAN
function will typically not result in any visible changes.

8
Spreadsheet functions

The TRIM and CLEAN functions


Example use:
● Remove leading, trailing, and repeated spaces as well as non-printable characters from all Tweets.

01. Enter =CLEAN(TRIM(B2)) on cell C2.

02. Replicate the formula to the other rows by dragging the fill handle down.

ID Text Cleaned and trimmed text

1028954810781814784 You send me crap You send me crapIt's 5 minutes to


It's 5 minutes to midnight for a mute midnight for a mute
https://t.co/FFhYHCitKb https://t.co/FFhYHCitKb

1028954474805710849 RT @Peters_Glen: It is always a good reminder RT @Peters_Glen: It is always a good


to see how the global average temperature has reminder to see how the global average
changed over the last 150 years... temperature has changed over the last
150 years...https://t.c…
https://t.c…

9
Spreadsheet functions

The SEARCH and FIND functions

| The SEARCH and FIND functions both return the position at which a string is first found within
text. SEARCH, however, ignores case while FIND is case-sensitive.

=SEARCH(search_for, text_to_search, [starting_at])


=FIND(search_for, text_to_search, [starting_at])

● search_for – The string to look for within For example:


text_to_search.
● =SEARCH("World", "Hello, World!") -> 8
● text_to_search – The text to search for the ○ 8 is the position of the letter W in the word
first occurrence of search_for. World.
● =FIND("World", "Hello, world!") ->
● starting_at – [ OPTIONAL] The character #VALUE!
within text_to_search at which to start the ○ FIND is case-sensitive and will therefore not
search. 1 by default. find a match.

10
Spreadsheet functions

The SEARCH and FIND functions


Example use:
Identify all tweets that mention the hashtag #climatechange.

● We will use SEARCH so that we can identify all relevant hashtags irrespective of sentence case.
● Since SEARCH will return an error if a Tweet does not contain the hashtag, we will use an IFERROR
statement to replace the error value with 0.

01. Enter =IFERROR(SEARCH("#climatechange",B2),0) on cell C2.

02. Replicate the formula to the other rows by dragging the fill handle down.

ID Text #climatechange

1028954652832882688 RT @6esm: Halfway to boiling: the city at 50C 73


https://t.co/jccTA8tDCS - #climatechange

1028954995469811713 #climatechange #spaceweather 1


https://t.co/hB3tQgPeys

11
Spreadsheet functions

The SPLIT function

| The SPLIT function divides text around a specified character or string and puts each
fragment into a separate cell in the row.

=SPLIT(text, delimiter, [split_by_each], [remove_empty_text])

● text – The text to divide. ● split_by_each – [OPTIONAL] Whether or not to


● delimiter – The character or characters to divide text around each character contained in
use to split text. the delimiter. TRUE by default.
By default, each character in the delimiter is ● remove_empty_text – [OPTIONAL] Whether or
considered individually, e.g. if the delimiter is not to remove empty text messages from the
"the", then text is divided around the split results. Default behavior is to treat
characters "t", "h", and "e". Set split_by_each consecutive delimiters as one (if TRUE). If FALSE,
to FALSE to turn off this behavior. empty cells’ values are added between
consecutive delimiters. 12
Spreadsheet functions

The SPLIT function


.Example use:.
Split all the Twitter texts into individual words.

● We will use the space character as the delimiter since words are separated by spaces.
● It is advisable to use the SPLIT function after the last column since its results populate the cells
horizontally.

01. Enter =SPLIT(B2, " ") on cell C2.

02. Replicate the formula to the other rows by dragging the fill handle down.

ID Text Split text

1028954403129184256 Gotta love the Gotta love the facts. https://t.co/bZ2G8AZuo9


facts.
https://t.co/bZ2G8A
Zuo9

13
Spreadsheet functions

The CONCATENATE function

| The CONCATENATE function appends strings to one another.

=CONCATENATE(string1, [string2, ...])

● string1 – The initial string.


● string2 – [OPTIONAL] Additional strings to
append in sequence.

For example:

● =CONCATENATE("Hello", " ", "World!") -> “Hello World!”


○ A white space must be included in the function where
needed.

14
Spreadsheet functions

The CONCATENATE function


.Example use:.
Combine each Twitter text with its corresponding ID.

● We will use a colon followed by a whitespace as a delimiter between the ID and text.

01. Enter =CONCATENATE(A2,": ",B2) on cell C2.

02. Replicate the formula to the other rows by dragging the fill handle down.

ID Text Combine text with ID

1028954403129184256 Gotta love the facts. https://t.co/bZ2G8AZuo9 1028954403129184256: Gotta love the
facts. https://t.co/bZ2G8AZuo9

1028954995469811713 #climatechange #spaceweather 1028954995469811713: #climatechange


https://t.co/hB3tQgPeys #spaceweather https://t.co/hB3tQgPeys

15
Spreadsheet functions

The UPPER, LOWER, and PROPER functions

| The UPPER function converts a specified string to uppercase, LOWER converts a specified
string to lowercase, and PROPER capitalizes each word in a specified string.

=UPPER(text) Some applications include:


=LOWER(text)
=PROPER(text_to_capitalise) ● Converting inconsistent capitalization for
uniformity and consistency.
● text – The string to convert to uppercase or
lowercase. ● Ensuring that all data entered into a cell is in
a consistent format.
● text_to_capitalize—The text which will be
returned with the first letter of each word in ● Manipulating text, e.g. converting text to
uppercase and all other letters in lowercase. lowercase and then using other functions to
extract specific characters from the string.

● Creating titles (UPPER and PROPER).

16
Spreadsheet functions

The UPPER and LOWER functions


Example use:
Find all Tweets containing the word climate, regardless of case.

● Since the FIND function is case-sensitive, we can start by converting the text to uppercase or
lowercase before applying the FIND function.
● It is common practice to convert text to lowercase during analysis so we will use the LOWER function.

01. Enter =IFERROR(FIND("climate", LOWER(B2)), 0) on cell C2.

02. Replicate the formula to the other rows by dragging the fill handle down.

ID Text Find climate

1028954635443171328 Climate change and wildfires – how do we know if there is 1


a link? https://t.co/2SqyvW7asF via @ConversationUS

1028954995469811713 #climatechange #spaceweather https://t.co/hB3tQgPeys 2

17

You might also like