An Interviewer's Favorite Question - "How Are Python Strings Stored in Internal Memory" - by Shubh Patni - Better Programmin
An Interviewer's Favorite Question - "How Are Python Strings Stored in Internal Memory" - by Shubh Patni - Better Programmin
You're reading for free via Shubh Patni's Friend Link. Upgrade to access the best of Medium.
Member-only story
Strings! One of the most favorite topics for all the programming interviewers,
and loved by everyone who starts programming no matter what programming
language they choose. Playing with strings is extremely interesting, but do you
know how Python stores the strings internally?
What if I ask you a question like “Are duplicates allowed in strings?”. Most of
you would say yes! And would give an example like “Mommy.” We can see
here that the character ‘m’ is repeating, but is that really the case?
In this article, I will give you a very clear picture of how strings are stored
internally inside memory, and I promise your perspective will change
completely regarding strings.
One important piece of advice that I would like to give to the readers is that
understanding a programming language from a memory perspective is the
most efficient way of learning a programming language! I bet you’ll hardly
forget the core concepts of programming once you try this out.
If you dig deeper, it turns out that strings use ‘Interned Dictionary.’ It’s a
simple dictionary that stores the character as the key and the address as the
value. Let’s understand this with the help of an example:
s = “Hello world”
In the above line, I created a string Hello world and stored it in a variable
called s. Abstractly, we can visualize this as it is represented inside memory
as shown below.
Now let’s see what actually happens internally and how an interned
dictionary works. Let me give you an example by creating a single character
string s1 and assigning it to a new variable s2 .
Okay! let’s break down the above image: when we created the first string s1 , a
string object gets created inside the memory, after this starts the process of
string interning. Python will first look up into the interned dictionary if the
character ‘A’ exists, as it was empty initially. A new key-value pair gets created,
the character ‘A’ is set as the key and the location of the object in which it
resides is set as the value that is 123 .
In the next step, when we assigned the string s1 to s2 , the address present in
s1 is sent to s2 and s2 starts to point towards the same object! We call this a
reference-type assignment.
Image created by author ‘Muhammad Abutahir’: the reference type assignment and string interning
Okay, now that’s clear! But why did I tell you that strings are extremely
memory efficient? Here’s why.
The above concept is very clear, but what would happen if I print the ids of a
character that is present in both the strings as common?
# Printing the ids of character ‘o’
print(id(s1[4]))#1004
print(id(s2[1]))#1004
Wow! That’s unusual, right? How can a character in different objects with
unique addresses have the same id?! Let’s understand it with the help of the
below figure:
So, I started with the creation of a string s1 , it’s important to understand that
the process of string interning starts simultaneously as the objects are created.
A multi-character string is a complete object but also from the figure above
you can notice that individual characters are also objects and they have their
own unique ids.
In the process of string interning, the individual characters get created in the
memory. Python will look into the interned dictionary to see if those
characters are already present, and if they are not present, an object is created
and the address along with the character as key are stored in the interned
dictionary.
In the above image, our string starts from H, so Python looks into the
container. Because it is empty, it stores the H as the key and its address as the
value in it. Next, the same thing repeats for the following two letters E and L.
The next letter is L again, so Python looks into the dictionary. As it is already
present, Python does not create a new object, rather it returns the address of
the previous L to the index location and this process continues.
The most interesting part is that this is not the case with just individual strings
stored in different variables! There is only one common interned dictionary
that is used by the whole Python program itself. Thus, even if the strings are
present in the different variables, they all will share the same addresses for
the unique characters present in the interned dictionary! This will make it
extremely memory-efficient! Also, about duplicates, they aren’t allowed when
you think from a memory perspective.
Summary
In this article, I discussed the internal implementation of strings and the
process of string interning in Python. As I have mentioned before,
understanding a programming language from its memory perspective is the
secret of mastering the fundamental concepts of that language.
Shubh
Patni
971 6
1.3K 11
198
464 6
spark partition data skew optimize optimization pyspark sql python UI partition
484 3
Lists
David Goudet
David
Goudet
7.3K 76
Vishal Thapa
Vishal
Thapa