0% found this document useful (0 votes)
68 views5 pages

String Basics - JetBrains Academy - Learn Programming by Building Your Own Apps

Strings are ordered arrays of characters that can represent text, numbers, or other data. Common string operations include concatenation, comparison, accessing individual characters by index, and determining the length. Substrings are contiguous portions of a larger string, and include prefixes (beginning substrings), suffixes (ending substrings), and borders (substrings that are both prefixes and suffixes). Strings have many applications, such as text searching, DNA analysis, and language translation.

Uploaded by

JorgeAntunes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views5 pages

String Basics - JetBrains Academy - Learn Programming by Building Your Own Apps

Strings are ordered arrays of characters that can represent text, numbers, or other data. Common string operations include concatenation, comparison, accessing individual characters by index, and determining the length. Substrings are contiguous portions of a larger string, and include prefixes (beginning substrings), suffixes (ending substrings), and borders (substrings that are both prefixes and suffixes). Strings have many applications, such as text searching, DNA analysis, and language translation.

Uploaded by

JorgeAntunes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Computer science → Fundamentals → Algorithms and structures → Data structures

String basics
Theory Practice 24% completed, 0 problems solved

Theory 2 required topics


10 minutes reading
Complex constructions in
pseudocode
Skip this topic Start practicing

Best, average and worst cases


What do you imagine when you think about data? You can think of a text, your
friend of a sequence of numbers, and someone else about all at once. All options 5 dependent topics
are correct. Is there a data type that represents a wide variety of data? Of course
Searching a substring
there is! This data type is string.

Strings are one of the most used data types, and for a reason. Strings play an Edit distance

essential role in multiple areas of our life: we translate text from different
String hashing
languages, work with documents in text editors, follow the scrolling text on TV
programs and search the Internet for the lyrics of our favorite songs. If it weren't
Run Length Encoding
for strings, we wouldn't be able to do all of this!
Hamming distance

§1. What is a string?


Table of contents:
A string is an ordered array of characters. The character can be anything: a letter
of the Greek alphabet, a number, or a strange Unicode symbol ֍. However, here we ↑ String basics
face the first complexity of strings: a string in programming is not always a
meaningful word or sentence, like in natural language. In general, a string is a §1. What is a string?
sequence of very different characters that can be printed on a computer.
§2. Operations on strings
For instance, here is a string:
§3. Substrings
Hello! My name is Jon Snow and I am from Winterfell!
§4. Strings applications
And this is a string too:
§5. Conclusion
q1#%0)n⍟
Discussion
You can also notice that a string can consist not only of one word, but also of
several. If there is a space character in a string, it is still one string, not two.

As mentioned earlier, strings are an ordered array. This means that every character
in a string corresponds to an index. The counting of characters in strings
traditionally starts from zero.

Accessing the needed symbol by its index usually looks like s[i]. Here, s is the
string and i is the index.

For instance, for string s = string basics and i = 5, s[i] is g. In programming


languages, strings are usually indicated by single or double quotes in order to
emphasize that what is in quotes is one whole, one string.

The length of a string s is the total number of characters it contains. We denote it


by ∣s∣. Some examples are the following:

100101 — a string of length 6;


GATTACA — a string of length 7;
string basics — a string of length 13.

A string can have zero length. In this case, we call it an empty string.
§2. Operations on strings
We can perform numerous operations on strings. Let's take a brief look at a few of
them:

reverse(s) returns the reversed string, i.e. the string written backward. For
example, reverse(LIVE) = EVIL;
concat(s,t) concatenates the given strings s and t. For instance,
concat(STR, INGS) = STRINGS;
compare(s, t) compares the given strings s and t. For example,
compare(JON, JOHN) = false;
get_symbol(s, i) returns the character in the given string s at the index i.
For example, get_symbol(ANIMALS, 1) = N. Note that if you call this
function from an index that is bigger than the word length, the function
returns with an error;
length(s) returns the length of a given string s. For instance,
length(HELLO) = 5.

Pretty often, we might need to compare two given strings. We can do this by
scanning and comparing every element of the first string with the corresponding
element of the same index in the other string. More precisely, given the strings s
and t, we check for every i whether s[i] = t[i]. If they all match, then we conclude
that s = t.

Here is a pseudocode for better visualization of comparing:

if length(s) != length(t) then // if the lengths do not match, we


can say that the strings are different
return False
for i to length(s): // alternatively, we can put here the length
of the string t, as it is the same as the length of the string s
if s[i] != t[i] then // if characters do not match, we
conclude that strings are not the same
return False
return True

It is not necessary to write your own functions for string operations, because
there are a few libraries or built-in methods that allow the usage of string
operations without coding them from scratch. For example, in Python
language you can perform operations with strings as with numbers, using +
for concatenation and == for comparing; in C++ there is a library string.h
with a lot of useful string functions.

§3. Substrings
A substring is a contiguous subsequence of symbols of an original string.
Naturally, a substring is called proper if it doesn't coincide with the whole string.

For instance, ATTA is a substring of the string GATTACA because the string
GATTACA contains the sequence ATTA. Note that GATTA, TT, TAC, and CA are also
substrings of the given string. Actually, there are many more substrings in our
string: try to find by yourself some substrings of GATTACA, other than the ones
mentioned above.

In terms of notation, the substring of the string s starting from the i-th and ending
with the j -th symbol is denoted by s[i, j]. For s = GATTACA, s[1, 4] = ATTA.
Obviously, the only nonproper substring of a string is the string itself. An
empty string is a substring of any string.

Now it's time to introduce some peculiar types of substrings. A substring starting
from the beginning of a string is a prefix, and a substring ending with the last index
is a suffix. For example, s[0, 2] = GAT is a prefix of s, and s[4, 6] = ACA is a suffix
of s.

The prefix can end anywhere in the main word, as well as the suffix can begin
anywhere. Hence, there are more prefixes of GATTACA: G, GA, GATT, GATTA,
GATTAC, GATTACA, and its suffixes are, for example, GATTACA, ATTACA, TTACA,
TACA, CA, and A.

The problem of searching for prefixes and suffixes is common because it can solve
the issue of finding the origin of the word by dropping prefixes and suffixes or
counting the number of occurrences of words starting with a particular beginning.

The whole word is a suffix and a prefix for itself as well. Also, empty prefixes
and suffixes can exist.

What if a certain prefix and suffix coincide? We end up with a new term: a border.
Formally, the border of a string is a non-empty substring that is both a proper
prefix and a proper suffix of the string. Proper, as we remember, means that the
whole string does not count as a substring. The longest border of ATTA is A.

As confusing as it might be, a prefix and a suffix can overlap. Hence, the longest
border of ABABAB is ABAB, as shown below:

Besides, there is another border in ABABAB, such as AB. In general, there is more
than one border in a given string. However, we are interested in considering the
longest one, because we can use it in calculating the prefix function, a magical tool
to work with strings, as we will discuss later.

Of course, a string may not have a border, because not always there is a proper
prefix that is equal to a proper suffix. A string is called unbordered if the only
border it has is an empty string.
§4. Strings applications
The fact that strings can represent any textual information makes them widely
used in a variety of fields:

String-searching algorithms. If you are interested in developing a text editor


and want to add a function for searching for a pattern in a text, then you need
to implement an algorithm that finds the positions of all occurrences of a
particular string in the text. It is classically done with help of the prefix
function, which depends on finding the longest borders.
DNA similarity measure. Calculating how strings differ from each other is
used to measure the variation between two strands of DNA. There are a lot of
ways of quantifying how dissimilar two strings are to one another by
counting the minimum number of operations required to transform one string
into the other.
Natural language translation. Determining the language of the text based on
the probabilities of characters and syllables is a complex task that requires a
lot of string functions, such as searching for the context of words or
searching for a substring by prefix while looking through a set of words in the
dictionary.
Checking for plagiarism. Have you ever wondered how modern plagiarism
detection programs work? Nowadays they perform online verification using
substring search algorithms among a large number of documents stored in
their own database. Who knows, maybe you will be the one who will improve
the plagiarism verification system so that no student will be able to cheat
while preparing an essay anymore. But this will only happen if you study
string basics well!
Analysis of people's requests. Almost everything we look for on the Internet
is saved and analyzed by special programs, which then offer us personalized
advertising. Our requests are saved as strings and then transmitted to the
advertising selection programs. Programs that generate a relevant logo
based on the invented name of the company work the same way.

§5. Conclusion
A string is a handy data type that is represented as an ordered sequence of
different characters. Strings get indexes starting from zero, and usually are framed
with quotes. Besides, there are empty strings —those that have zero length.

There are a lot of string functions, such as reversing a string, getting its length, or
returning a character by index. There are also some methods for comparing and
concatenating two strings.

A huge section of strings' area of implementation is processing substrings. With


the help of substrings, plagiarism verification systems work, and words are
searched in dictionaries. Sometimes when we talk about substrings, we mean
prefixes or suffixes that begin or end strings, respectively. If a prefix is equal to a
suffix, the substring is called the border.

Since strings are widely used in many areas, most programming languages have
tools to work with them. With the help of these methods and manual functions,
you can implement translation functions, measure the distance between strings,
and determine the language of a text!

For these reasons, it's essential to be familiar with some standard and well-known
string processing algorithms and their applications. We will examine some of them
in the following topics.

Report a typo

75 users liked this piece of theory. 9 didn't like it. What about you?
Start practicing Skip this topic

Comments (3) Useful links (0) Show discussion

Tracks About Become beta tester

Pricing Contribute Be the first to see what's new


For organizations Careers

Terms Support Made with by Hyperskill and JetBrains

You might also like