Strings in Python – Full Theoretical Explanation
🔍 What Is a String in Python?
A string in Python is a sequence of Unicode characters, implemented as an immutable
object. You can think of it as a read-only array of characters that supports various
operations.
Example:
s = "hello"
Each character is accessible using its index, starting at 0.
🧬 Under the Hood: How Strings Work in Python
🔹 Memory Model:
When you create a string like "hello":
Python allocates memory for a sequence of characters.
It stores the string as a contiguous block of memory, just like an array.
It also stores metadata, such as:
o String length
o Encoding (UTF-8, UTF-16, etc.)
o A hash value (for fast dictionary lookups)
💡 Unlike C, Python strings do not use null terminators (\0). Python internally tracks length.
🔹 String Interning
Python optimizes memory using a technique called string interning.
What is String Interning?
It means Python will reuse immutable strings (especially short strings and identifiers) rather
than creating new copies.
a = "hello"
b = "hello"
print(a is b) # True – both point to the same memory
Python keeps a global pool of common strings to save memory and speed up comparison.
🚫 Strings Are Immutable
Once a string is created, you cannot change it. Any modification results in a new string
object.
s = "cat"
s[0] = "b" # ❌ This raises a TypeError
Why Immutability?
1. Thread safety – Multiple threads can share the same string.
2. Hashability – Strings can be used as keys in dictionaries.
3. Performance – Enables interning and caching.
🧪 How Strings Are Stored Internally
Let’s say you define:
s = "Chat"
Python stores it something like this (simplified):
Index Value (Char) Memory Address
0 'C' 1000
1 'h' 1001
2 'a' 1002
3 't' 1003
And Python keeps metadata:
Length = 4
Encoding = UTF-8
Hash = (precomputed for fast lookup)
📦 Data Type and Class
Strings in Python are instances of the str class.
type("hello") # <class 'str'>
They support a huge set of built-in methods, like:
.lower(), .upper()
.find(), .replace()
.split(), .join()
.strip(), .isalpha(), .isdigit(), etc.
These methods do not mutate the original string — they return new strings.
🔁 String Memory Reuse (Example)
s1 = "openai"
s2 = "openai"
print(s1 is s2) # True
Why?
Python interns small strings that look like identifiers or constants.
But:
s1 = "hello world!"
s2 = "hello world!"
print(s1 is s2) # May be False
Longer or dynamic strings may not be interned unless explicitly done using sys.intern().
🧠 Summary: How Python Handles Strings
Feature Python Behavior
Mutable? ❌ No – immutable
Stored as Array of Unicode characters
Indexed? ✅ Yes (0-based)
Null terminator? ❌ No
Hashable? ✅ Yes
Supports slicing? ✅ Yes
Dynamic sizing? ✅ Yes (new object on change)
📌 Real Memory Management Behavior
Every new string is stored as a heap object.
Python manages this memory via its garbage collector and reference counting.
String objects are freed when nothing references them anymore.
🔍 Bonus: Unicode Support
Python strings are Unicode by default. That means you can store:
s = "नमस्ते"
t = "你好"
u = "😊"
All are valid Python strings, and the internal encoding ensures safe handling of multilingual
data.
🔖 Recap Mental Model:
“A Python string is an immutable, memory-efficient sequence of Unicode characters stored
with metadata like length, encoding, and hash.”
Basic String Operations in Python (with Theoretical Explanation)
These are the building blocks for working with strings efficiently and cleanly.
🔹 1. Indexing
🧠 Theory:
Each character in a string has a position (index).
Indexing allows direct access to any character.
Python supports positive and negative indexing.
Syntax:
s = "python"
print(s[0]) # 'p'
print(s[-1]) # 'n' (last character)
🔍 Memory View:
Think of s = "python" like:
Index Value
0 'p'
1 'y'
2 't'
3 'h'
4 'o'
5 'n'
-1 'n'
-2 'o'
... ...
🔹 2. Slicing
🧠 Theory:
Slicing is like cutting a substring from the original string.
It creates a new string (doesn’t modify the original).
Syntax:
s = "python"
print(s[1:4]) # 'yth' → index 1 to 3
print(s[:3]) # 'pyt' → from 0 to 2
print(s[3:]) # 'hon' → from 3 to end
Structure:
s[start:stop:step]
Examples:
s = "openai"
print(s[::2]) # 'oen'
print(s[::-1]) # 'ianepo' → reversed string
🔹 3. Concatenation and Repetition
🧠 Theory:
Since strings are immutable, concatenation creates a new string.
Internally, Python copies characters to a new memory location.
Examples:
s1 = "data"
s2 = "science"
combined = s1 + " " + s2 # 'data science'
print(combined)
repeat = "ha" * 3 # 'hahaha'
❗ Too many concatenations in a loop are inefficient. Use .join() instead.
🔹 4. Membership Testing
🧠 Theory:
Uses a linear scan to check if a substring exists.
s = "machine learning"
print("learn" in s) # True
print("data" not in s) # True
🔹 5. String Length
s = "algorithm"
print(len(s)) # 9
Internally, Python does not count each time — it stores the length in metadata.
🔹 6. String Iteration
for ch in "DSA":
print(ch)
You can treat strings like lists — they are iterables.
🔹 7. Immutability Reminder
s = "code"
s[0] = "m" # ❌ Error: strings can't be changed in-place
To "change" a string, you create a new one:
s = "code"
s = "m" + s[1:] # 'mode'
🧵 Summary Table
Operation Description Output Example
s[0] First char 'p'
s[-1] Last char 'n'
s[1:4] Slice from 1 to 3 'yth'
s[::-1] Reverse string 'nohtyp'
s+t Concatenate 'helloworld'
'in' Check if substring exists True
len(s) Length 6
for ch in s Loop through string One char per line
✅ Your Mental Checklist:
Can you explain how slicing works with memory in mind?
Can you avoid creating many intermediate strings?
Do you understand that all these operations return new strings?
String Searching in Python
Imagine you are reading a long book. You're looking for a specific phrase, say:
"The secret door was hidden behind the library."
Now, this book has millions of characters — how would you find that phrase manually?
You'd likely start from the beginning, reading line by line, comparing what you see with the
phrase in your mind. When a few matching words begin to show up, you'd lean in and
compare more carefully.
This is exactly how a naive string search works.
🧠 What Is String Searching?
String searching refers to the process of locating one string (called the pattern) inside
another string (called the text). The goal is to find whether it exists, and if so, at what
position.
In computer terms:
Text: The main data you're scanning (a sentence, a book, a file).
Pattern: The smaller string you're looking for.
If the pattern is found, the algorithm returns its position; if not, it says it doesn't exist.
⚙️Python’s Built-in Search Behavior (Behind the Scenes)
In Python, you often do:
if "apple" in "I bought an apple pie":
print("Found!")
Behind this simple syntax, Python does something similar to the manual search: it starts
from the left, checks character-by-character to see if "apple" is there.
So even if it looks simple on the outside, it's doing the same fundamental operation:
matching a pattern one position at a time.
🧠 How Does This Actually Work?
Let's break it into steps.
Suppose:
Text = "hello there, general kenobi"
Pattern = "general"
The algorithm starts with index 0 in the text and compares:
"hello t" ≠ "general"
"ello th" ≠ "general"
…
Eventually at index 13, we get "general" == "general"
✅ Match found at position 13
This process is called Naive Pattern Matching, because it's the most straightforward (and
least optimized) way to do it.
Theoretical Efficiency: Why This Matters
Imagine doing this in a search bar inside a massive database or file.
If:
The text has 1 million characters
The pattern is 10 characters
The naive algorithm might have to compare each of those 1 million - 10 + 1 = 999,991
positions. That's almost a million checks!
Each check itself takes time (comparing 10 letters), so total time is roughly:
O(n × m) where:
n is length of text
m is length of pattern
This becomes very slow if repeated thousands of times (like in real search engines or spell
checkers).
🧠 Real-Life Analogy
You’re checking whether someone is in a long attendance list printed on paper:
Naive search is like reading every name line-by-line and matching letters one by one.
Efficient search (we’ll learn later) is like having the list indexed or alphabetically
sorted — or like having a highlighted pattern in your glasses.
That’s how modern algorithms work — they preprocess data or patterns to skip
unnecessary comparisons.
📘 What Happens in Python’s find() Method?
When you do:
s = "the sun rises in the east"
s.find("sun")
Python internally starts from index 0, comparing 3-character slices (s[i:i+len(pattern)]) until it
finds a match.
It does not use the advanced KMP or Boyer-Moore algorithms unless you're using
specialized libraries. But for short texts and simple scripts, it's fast enough.
🧬 Why Not Just Use Regex?
Regex is powerful, but it's not a substitute for understanding.
Think of regex as pattern search on steroids — you define complex rules (like: "must start
with a number, followed by 3 letters").
But regex also needs a search engine underneath — it just adds a more expressive search
language.
💡 What Should You Take Away?
Searching in strings is fundamental — it's everywhere: from Ctrl+F in browsers to
DNA analysis tools.
The naive approach (manual matching from left to right) is the basis of all string
search algorithms.
Python’s built-in tools like in, find(), and index() all rely on pattern matching logic
behind the scenes.
Real-world search systems need better speed — that’s where efficient algorithms
like KMP and Boyer-Moore come in.
🧠 Final Mental Model:
Think of a string as a road, and your pattern as a car. The search is the act of driving the car
from start to finish, checking every parking spot (index) to see if it matches your destination
(the pattern). The naive way checks each spot. Smarter cars skip ahead when the road signs
look familiar.
Lexicographical Order and String Comparison (Theoretical Deep Dive)
💡 What Is Lexicographical Order?
Lexicographical order is dictionary order — the order in which words appear in a dictionary.
It’s how we expect words to be sorted in:
Dictionaries 📘
Contact lists 📇
File explorers 🗂
Leaderboards 🏆
So when you see "apple" < "banana", you’re doing a lexicographical comparison.
🧠 Theoretical Definition
Lexicographical order is a way to compare sequences (like strings) based on the order of
their characters from left to right.
Imagine comparing "cat" and "car":
First letter: c == c → go to next
Second letter: a == a → go to next
Third letter: t > r → 'cat' > 'car'
So:
python
CopyEdit
"cat" > "car" # True
🧬 Why Does This Work in Python?
In Python, strings are compared character-by-character using ASCII/Unicode values of the
characters.
Each character has a numeric code internally:
Character ASCII
'a' 97
'b' 98
'c' 99
... ...
'A' 65
'B' 66
So:
python
CopyEdit
print("apple" < "banana") # True, because 'a' < 'b'
print("Apple" < "apple") # True, because 'A' < 'a'
📦 How String Comparison Actually Works in Python
Let’s break this down:
Step-by-Step:
To compare "cat" and "car":
1. Compare first character: 'c' vs 'c' → Equal
2. Move to second: 'a' vs 'a' → Equal
3. Move to third: 't' vs 'r' → 't' > 'r' → Result: "cat" > "car"
If all characters are equal, the shorter string comes first:
python
CopyEdit
"cat" < "catalog" # True
"data" < "database" # True
Because "cat" ends while "catalog" continues.
🔄 Sorting Strings Lexicographically
Python’s sorted() and sort() functions use this logic:
python
CopyEdit
words = ["banana", "apple", "carrot"]
print(sorted(words)) # ['apple', 'banana', 'carrot']
Behind the scenes, it compares each string by its character codes.
🧠 Real-World Analogy
Imagine working in a library, sorting books. You look at the book titles:
If the first letters differ, sort based on that.
If they’re the same, move to the second letter.
Continue until you find a difference.
If no difference and one title ends first, the shorter one comes first.
This is how dictionaries, contact lists, and file names are sorted.
🔍 Important Notes
Comparisons are case-sensitive by default:
python
CopyEdit
"Apple" < "banana" # True because 'A' (65) < 'b' (98)
For case-insensitive sorting, you can convert everything to lowercase first:
python
CopyEdit
sorted(words, key=lambda w: w.lower())
🛠 Real-Time Use Cases
System/Tool Lexicographical Use
File Managers Sorting files alphabetically
Spreadsheets Sorting columns of text
Databases ORDER BY name ASC logic
Auto-complete Suggesting entries in dictionary order
Online forms Dropdowns sorted alphabetically
🧠 Mental Model
“Comparing strings is like two kids running a race. They start together. The first one who
takes a different path (i.e., different character) determines who wins. If they run neck and
neck, the shorter one wins because they cross the finish line earlier.”