Data Types and Data Structures
Data Types and Data Structures
A data type, in programming, is a classification that specifies which type of value a variable
has. When working with data, we need ways to store it in variables so we can manipulate it.
Three of the most common data types used in programming: Numbers, Strings and
Booleans. We assigned those data types to variables one-by-one, like so:
x = 3 # numbers
a = "Python” # strings
t = True # Boolean
Data Structures are efficient ways to store data, so that we can access and
manipulate data accordingly.
Everything in Python is an object. And all objects in Python can be either mutable or
immutable.
Mutable: An object that can be changed after it is created during run-time. It means you can
change their content without changing their identity. Mutable objects in python are list,
dictionary and set.
Immutable: An object that can’t be changed after it is created is called an immutable object.
Immutable objects in python are int, float, complex, string, tuple and frozen set.
Integers and floating points are separated by the presence or absence of a decimal point. 5
is integer whereas 5.0 is a floating point number.
Complex numbers are written in the form, x + yj, where x is the real part and y is the
imaginary part.
a = 1 int
b = 10.5 float
c = 3+4j complex
While integers can be of any length, a floating point number is accurate only up to 15
decimal places (the 16th place is inaccurate).
Numbers we deal with everyday are decimal (base 10) number system. But computer
programmers (generally embedded programmer) need to work with binary (base 2),
hexadecimal (base 16) and octal (base 8) number systems.
In Python, we can represent these numbers by appropriately placing a prefix before that
number. Following table lists these prefix
List (Mutable):
List is one of the most frequently used and very versatile datatype used in Python.
It’s an ordered collection of heterogeneous elements enclosed using square brackets
[ ].
It is a mutable type means we can add or remove elements from the list.
It maintains insertion order and are similar to arrays.
List elements can be accessed using indexing, which starts from 0.
Creating List:
Creating a list is as simple as putting different comma-separated values in square brackets.
a = [1, 2, 3, 4]
b = ['a', 'b', 'c', 'd']
c = ['one', 'two', 'three', 'four']
l = [1, 2, 3, 4]
l[0] 1
l[3] 4
l[5] Throws an index error.
d = [1, 2, 'three', 'four']
d[1] 2
d[2] ‘three’
Positive 0 1 2 3 4 5 6 7
index
Values 9 14 12 19 16 18 24 15
Negative -8 -7 -6 -5 -4 -3 -2 -1
index
Indexing:
Slicing:
To retrieve a portion or part of data from a given sequence or list. A sub-list created by
specifying start/end indexes. Lists will traverse through forward direction in case of slicing
with positive step. Traverse through backward direction in case of negative step.
Syntax:
Examples:
a[:5] [9, 14, 12, 19, 16] # from the beginning of list till 5th index
(excluding 5th index)
a[2:] [12, 19, 16, 18, 24, 15] # from 2nd index to till the end.
a[2:6] It will print elements from indexes 2 to 6(excluding 6) [12, 19,
16, 18] default step 1.
a[2:8:2] It will print elements from indexes 2 to 8(excluding 8) [19,
18, 15] with step 2.
a[-1:-5] [ ] # returns an empty list.
a[-5:-2] [19, 16, 18].
a[3] = 10 [9, 14, 12, 10, 16, 18, 24, 15]. It will update 3rd index
value 19 to 10.
a[3:6] = [20, 30] [9, 14, 12, 20, 30, 24, 15]
a[::-1] # here step is -1 which will read a list from backward direction.
In fact, lists respond to all of the general sequence operations we used on strings in the prior
chapter.
Python Expression Results Description
List Methods:
dir(list) will display all the methods we can perform on list object.
The methods or attributes which start with __ (double underscore) and endswith __ (double
underscore) are called as magic methods.
Method Description
clear() Removes all the elements from the list keeping the structure.
append:
Appends adds its argument as a single element to the end of a list.
a = [ ]
a.append(5) [5]
a.append(‘hello’) [5, ‘hello’]
a.append([1,2,3]) [5, ‘hello’, [1,2,3]]
clear:
clear() will remove all items from the list. But the structure remains same.
a = ['one', 'two', 'three', 'four', 'five']
a.clear() [ ]
a = [1, 10, 5, 7]
b = a.copy() [1, 10, 5, 7]
index(object):
data = [10,'abc','python',123.5]
data.index('python') 2
data.index('abc') 1
pop()/pop(int):
data = [10,'abc','python',123.5,786]
data.pop() 786 # removes last element from list
data.pop(1) 'abc' # removes element from the specified index.
insert(index, object):
data = ['abc',123,10.5,'a']
data.insert(2,'hello') ['abc', 123, 'hello', 10.5, 'a']
extend(sequence):
data1 = ['abc',123,10.5,'a']
data2 = ['ram',541]
data1.extend(data2) ['abc', 123, 10.5, 'a', 'ram', 541]
remove(object):
Removes an element from the list, if the element doesn’t exist it throws an error.
data.remove(10)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: list.remove(x): x not in list
reverse():
list1 = [10,20,30,40,50]
list1.reverse() [50, 40, 30, 20, 10]
sort():
Tuple (Immutable):
It’s an ordered collection of heterogeneous elements enclosed using parentheses ( ).
It is an immutable type means we can’t add or remove elements in tuple.
Tuple elements can be accessed using indexing, which starts from 0.
Default return type in python is tuple.
Tuple is similar to list. Only the difference is that list is enclosed between square bracket,
tuple between parenthesis and List is mutable type whereas Tuple is an immutable type.
t = ( ) # empty tuple
t = 1, 2, 3, 4 (1, 2, 3, 4) # default type is tuple.
t = (1,) # one element tuple.
Accessing Tuple:
Tuple can be accessed in the same way as list using indexing and slicing.
data1=(1, 2, 3, 4)
data2=('x', 'y', 'z')
data1[0] 1
data1[0:2] (1, 2)
data2[-3:-1] (‘x’, ‘y’)
data1[0:] (1, 2, 3, 4)
data2[:2] (‘x’, ‘y’)
We can’t modify elements in tuple using assignment, due to its immutable nature. Only two
methods we can perform on tuple which are count and index.
dir(tuple):
Adding tuples:
t1 = (1, 2, 3)
t2 = (7, 8, 9)
t1 + t2 (1, 2, 3, 7, 8, 9)
Strings (Immutable):
1. Strings are sequences of characters.
2. Python strings are "immutable" which means they cannot be changed after they are
created.
3. To create a string, put the sequence of characters inside either single quotes, double
quotes, or triple quotes and then assign it to a variable.
Slicing:
To retrieve a portion or part of data from a given sequence or string.
There can be many forms to slice a string. As string can be accessed or indexed from both
the direction and hence string can also be sliced from both the direction that is left and right.
Syntax:
data ="dreamwin"
data[0:6] ‘dreamw’
data[2:7] ‘eamwi’
data[:5] ‘dream’
data[2:] ‘eamwin’
data[-1: -6] ‘ ’ # returns an empty string
data[-6: -1] ‘eamwi’
data[-1:] ‘n’
data[: -5] ‘dre’
data[2:8:2] ‘emi’
data[:8:2] ‘demi’
Python Strings are by design immutable. It suggests that once a String binds to a variable; it
can’t be modified. If you want to update the String simply re-assign a new String value to the
same variable.
Similarly, we cannot modify the Strings by deleting some characters from it. Instead, we can
remove the Strings altogether by using ‘del’ command.
Python has several built-in methods associated with the string data type. These methods let
us easily modify and manipulate strings.
List down all string methods using dir(str). Below are the methods we can perform on string
object.
['__add__', '__class__', '__contains__', '__delattr__', '__dir__',
'__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
'__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__',
'__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__',
'__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith',
'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum',
'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric',
'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower',
'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust',
'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith',
'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
String Methods:
Capitalize: Converts the first character of a string into Upper case.
s = "welcome to dreamwin"
s.capitalize() 'Welcome to dreamwin'
s='DreamWin Technologies'
s.casefold() 'dreamwin technologies'
Center: Returns a new copy of the string after centering it in a field of length width
s = 'dreamwin'
s.center(20) ' dreamwin '
s.center(20, '*') '******dreamwin******'
Rjust: Returns a new copy of the string justified to left in field of length width.
s = "Dreamwin"
s.rjust(10) ' Dreamwin'
s.rjust(5) 'Dreamwin'
s.rjust(12, '*') '****Dreamwin'
Ljust: Returns a new copy of the string justified to left in field of length width.
s = "Dreamwin"
s.ljust(10) 'Dreamwin '
s.ljust(5) 'Dreamwin'
s.ljust(12, '*') 'Dreamwin****'
Endswith: endswith() method returns True if a string ends with the specified prefix(string). If
not, it returns False
s = 'dreamwin'
s.endswith('n') True
s.endswith('h') False
s.endswith('win') True
Expand tabs:
s = 'xyz\t12345\tabc'
s.expandtabs() 'xyz 12345 abc'
s.expandtabs(2) xyz 12345 abc
s.expandtabs(3) xyz 12345 abc
s.expandtabs(4) xyz 12345 abc
s.expandtabs(5) xyz 12345 abc
Startswith: startswith() method returns True if a string starts with the specified prefix(string).
If not, it returns False.
s = "dreamwin tech"
s.startswith('dream') True
s.startswith('tech') False
s = "Python is awesome"
s.count('s') 2
s.count('w') 1
s.count('on') 1
s.count('x') 0
Find: finds the substring and returns the first occurrences index if exists. If not find return -1.
s = 'Python is awesome'
s.find('is') 7
s.find('e') 12
s.find('x') -1
Rfind: finds the substring from the right side and returns the first occurrences index if exists.
If not find returns -1.
s = 'Python is awesome'
s.rfind('is') 7
s.rfind('e') 16
s.rfind('x') -1
s = 'Python is awesome'
s.index('is') 7
s.index('e') 12
s.index('x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found
Rindex: Returns the index of a given substring from the right side occurrence.
s = 'Python is awesome'
s.rindex('is') 7
s.index('e') 16
s.index('x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found
Format: The format() reads the type of arguments passed to it and formats it according to
the format codes defined in the string.
format() method takes any number of parameters. But, is divided into two types of
parameters:
Join:
The join() method provides a flexible way to concatenate string. It concatenates each
element of an iterable (such as list, string and tuple) to the string and returns the
concatenated string.
The join() method returns a string concatenated with the elements of an iterable. If the
iterable contains any non-string values, it raises a TypeError exception.
Syntax:
Examples:
s1 = 'abc'
s2 = '123'
""" Each character of s2 is concatenated to the front of s1"""
s1.join(s2)) 1abc2abc3abc
""" Each character of s1 is concatenated to the front of s2"""
s2.join(s1)) a123b123c123
Split: splits the given string based on delimiter. Default delimiter in whitespace.
Rsplit: rsplit() method splits string from the right at the specified separator and returns a list
of strings.
Strip: strip() method returns a copy of the string with both leading and trailing characters
removed (based on the string argument passed).
Lstrip: lstrip() method returns a copy of the string with leading characters removed (based on
the string argument passed).
Rstrip: rstrip() method returns a copy of the string with trailing characters removed (based on
the string argument passed).
Strip methods by default removes white spaces in a string, if you don’t provide any
arguments.
s = " hello welcome to python "
s.strip() 'hello welcome to python'
s.rstrip() ' hello welcome to python'
s.lstrip() 'hello welcome to python '
s.strip(',') ' hello welcome to python '
s = "Python is awesome"
s.strip('Py') 'thon is awesome'
dt = { }
dt[‘lang’] = ‘Python’
dt[‘year’] = 1990
dt[‘author’] = ‘Guido Van Rossum’
print(dt) {'author': 'Guido Van Rossum', 'year': 1990, 'lang':
'Python'}
Dictionary are mutable. We can add new items or change the value of existing items using
assignment operator.
If the key is already present, value gets updated, else a new key: value pair is added to the
dictionary.
1. We can remove a particular item in a dictionary by using the method pop(). This method
removes as item with the provided key and returns the value.
2. The method, popitem() can be used to remove and return an arbitrary item (key, value)
form the dictionary. All the items can be removed at once using the clear() method.
3. We can also use the del keyword to remove individual items or the entire dictionary itself.
Dictionary methods:
Dictionary methods can be listed using dir(dict):
['__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__',
'__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear',
'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']
Clear: The clear() method removes all items from the dictionary.
You can also remove all elements from the dictionary by assigning empty dictionary { }.
However, there is a difference between calling clear() and assigning { } if there is another
variable referencing the dictionary.
Copy: They copy() method returns a shallow copy of the dictionary. It doesn't modify the
original dictionary. It creates a new dictionary.
Fromkeys: The fromkeys() method creates a new dictionary from the given sequence of
elements with a value provided by the user.
>>> {}.fromkeys('python')
{'h': None, 't': None, 'p': None, 'o': None, 'n': None, 'y': None}
>>> {}.fromkeys('python', 10)
{'h': 10, 't': 10, 'p': 10, 'o': 10, 'n': 10, 'y': 10}
Get:
The get() method takes maximum of two parameters:
key - key to be searched in the dictionary
value (optional) - Value to be returned if the key is not found. The default value is None.
The get() method returns:
the value for the specified key if key is in dictionary.
None if the key is not found and value is not specified.
value if the key is not found and value is specified.
Keys: The keys() returns a view object that displays a list of all the keys.
Values: The values() method returns a view object that displays a list of all values in a given
dictionary.
Items: The items() method returns a view object that displays a list of a given dictionary's
(key, value) tuple pair.
Popitem: The popitem() returns and removes an arbitrary element (key, value) pair from the
dictionary.
Pop: The pop() method removes and returns an element from a dictionary having the given
key.
Setdefault: The setdefault() method returns the value of a key (if the key is in dictionary). If
not, it inserts key with a value to the dictionary.
The setdefault() takes maximum of two parameters:
1. key - key to be searched in the dictionary.
2. default_value (optional) - key with a value default_value is inserted to the dictionary if
key is not in the dictionary.
3. If not provided, the default_value will be None.
>>> c = {'name': 'dreamwin tech', 'place': 'bangalore', 'start': 2017}
>>> c.setdefault('course')
>>> c {'course': None, 'name': 'dreamwin tech', 'place':
'bangalore', 'start': 2017}
>>> c.setdefault('area', 'marathahalli') 'marathahalli'
>>> c {'course': None, 'name': 'dreamwin tech', 'area':
'marathahalli', 'place': 'bangalore', 'start': 2017}
Update: The update() method updates the dictionary with the elements from the another
dictionary object or from an iterable of key/value pairs.
The update() method takes either a dictionary or an iterable object of key/value pairs
(generally tuples).
If update() is called without passing parameters, the dictionary remains unchanged.
2. Its definition starts with enclosing braces { } having its items separated by commas
inside.
3. However, the set itself is mutable. We can add or remove items from it.
4. Sets can be used to perform mathematical set operations like union, intersection,
symmetric difference etc.
The set type has a significant advantage over a list. It implements a highly optimized method
that checks whether the container hosts a specific element or not. The mechanism used
here is based on a data structure known as a hash table.
Creating A Set:
To create a set, call the built-in set() function with a sequence or any iterable object.
Note: Creating an empty set is a bit tricky. Empty curly braces { } will make an empty
dictionary in Python. To make a set without any elements we use the set() function without
any argument.
>>> a = { }
>>> type(a) <class 'dict'>
>>> s = set()
>>> type(s) <class 'set'>
Sets are mutable. But since they are unordered, indexing have no meaning.
We cannot access or change an element of set using indexing or slicing. Set does not
support it.
We can add single element using the add() method and multiple elements using the update()
method.
The update() method can take tuples, lists, strings or other sets as its argument. In all cases,
duplicates are avoided.
A particular item can be removed from set using methods, discard() and remove().
The only difference between the two is that, while using discard() if the item does not exist in
the set, it remains unchanged. But remove() will raise an error in such condition.
>>> my_set = set('dreamwin tech')
>>> my_set {'d', 'm', 'a', 'c', 'h', 't', 'w', 'e', ' ', 'i', 'r',
'n'}
>>> my_set.discard('h')
>>> my_set {'d', 'm', 'a', 'c', 't', 'w', 'e', ' ', 'i', 'r', 'n'}
>>> my_set.remove('i')
>>> my_set {'d', 'm', 'a', 'c', 't', 'w', 'e', ' ', 'r', 'n'}
>>> my_set.discard(10) None
>>> my_set.remove('p')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'p'
Similarly, we can remove and return an item using the pop() method.
Set being unordered, there is no way of determining which item will be popped. It is
completely arbitrary.
We can also remove all items from a set using clear().
Sets can be used to carry out mathematical set operations like union, intersection, difference
and symmetric difference. We can do this with operators or methods.
Let us consider the following two sets for the following operations.
>>> A = {1, 2, 3, 4, 5}
>>> B = {4, 5, 6, 7, 8}
Union: Union of A and B is a set of all elements from both sets.Union is performed using |
operator. Same can be accomplished using the method union().
Intersection:
Intersection of A and B is a set of elements that are common in both sets. Intersection is
performed using & operator. Same can be accomplished using the method intersection().
Difference:
Difference of A and B (A - B) is a set of elements that are only in A but not in B. Similarly, B -
A is a set of element in B but not in A.
Difference is performed using - operator. Same can be accomplished using the method
difference().
Symmetric Difference:
Symmetric Difference of A and B is a set of elements in both A and B except those that are
common in both.
Symmetric difference is performed using ^ operator. Same can be accomplished using the
method symmetric_difference().
Difference_update:
The difference_update() updates the set calling difference_update() method with the
difference of sets.
Symmetric_difference_update:
The symmetric_difference_update() method updates the set calling the
symmetric_difference_update() with the symmetric difference of sets.
isdisjoint(): Return True if two sets have a null intersection
issubset(): Return True if another set contains this set
issuperset(): Return True if this set contains another set
FrozenSet: (Immutable)
Frozenset is a new class that has the characteristics of a set, but its elements cannot be
changed once assigned. While tuples are immutable lists, frozensets are immutable sets.
Sets being mutable are unhashable, so they can't be used as dictionary keys. On the other
hand, frozensets are hashable and can be used as keys to a dictionary.
Frozensets can be created using the function frozenset().
This datatype supports methods like copy(), difference(), intersection(), isdisjoint(),
issubset(), issuperset(), symmetric_difference() and union(). Being immutable it does not
have method that add or remove elements.
Interview Questions:
Datatypes in Python?
1. Numbers (int, float, complex)
2. String
Data structures in python?
1. List
2. Tuple
3. Dictionary
4. Set
5. Frozenset
Mutable vs Immutable?
A mutable object can change its state or contents during run-time and immutable objects
cannot.
List vs Tuple?
List is a collection of heterogeneous elements and enclosed using square brackets []. It’s a
mutable type.
Tuple is a collection of heterogeneous elements and enclosed using parentheses (). It’s an
immutable type.
When do you use list vs tuple?
The main difference between list and tuple is you can change the list but you cannot change
the tuple. Tuple can be used as keys in mapping where list is not.
What is used to represent Strings in Python? Is double quotes used for String representation
or single quotes used for String representation in Python?
You can specify strings using single quotes such as ‘Hello welcome to python’. All white
space i.e. spaces and tabs are preserved as-is.
Strings in double quotes work exactly the same way as strings in single quotes.
Eg:
S = “What’s your name”
You can specify multi-line strings using triple quotes. You can use single quotes and double
quotes freely within the triple quotes. An example is
Triple single quotes are used to define multi-line strings or multi-line comments.
Slicing in Python is a mechanism to select a range of items from Sequence types like
strings, list, tuple, etc.
Example of slicing:
>>> l=[1,2,3,4,5]
>>> l[1:3]
[2, 3]
>>> l[1:-2]
[2, 3]
>>> l[-3:-1] # negative indexes in slicing
[3, 4]
Python list items can be accessed with positive or negative numbers (also known as index).
For instance our list is of size n, then for positive index 0 is the first index, 1 second, last
index will be n-1. For negative index, -n is the first index, -(n-1) second ... A negative index
accesses elements from the end of the list counting backwards.
An example to show negative index in python.
>>> a= [1, 2, 3]
>>> print a[-3]
1
>>> print a[-2]
2
>>> print a[-1]
3
List Tuple
List objects are mutable objects Tuple objects are immutable Objects
Applying iterations on list objects takes Applying iterations on tuple Objects
longer time takes less time
If the frequent operation is insertion or If the frequent operation is retrieval of
deletion of the elements then it is the elements then it is recommended
recommended to use list to use tuple
Tuple can be used as a key for the
List can’t be used as a ‘key’ for the
dictionary if the tuple is storing only
dictionary
immutable elements
d = {}
d[‘name’] = ‘Python’
d[‘year’] = 1990
def f(x,l=[]):
for i in range(x):
l.append(i*i)
print(l)
f(2)
f(3,[3,2,1])
f(3)
Answer
[0, 1]
[3, 2, 1, 0, 1, 4]
[0, 1, 0, 1, 4]
Explain set?
Set is an unordered collection of unique and immutable or hash able elements. It’s a mutable
type.
S = {}
s.add(10) {10}
set is used to remove duplicates from an iterable. One can perform mathematical set
operations like union, intersection, difference..
Explain list methods append vs extend?
Append and extend are list methods, which are used to add elements at the end of the list.
Append: using append we can add any elements at the end of the list.
l=[]
l.append(10) [10]
l.append(‘hello’) [10, ‘hello’]
l.append([1, 2, 3]) [10, ‘hello’, [1, 2, 3]]
Extend: using extend we can add only iterables into a list. i.e except integers we can add
any type.
l.append(‘hell’) [‘h’, ‘e’, ‘l’, ‘l’, ‘o’]
l.append([1,2,3]) [1, 2, 3]
Difference between sort and sorted?
Sort is a list method, which will does the in-place sorting. It means it changes the original list
object.
Sorted is a built-in function, upon sort it creates new object. It will not modify the original list
object.
range() and xrange() are two functions that could be used to iterate a certain number of
times in for loops in Python. In Python 3, there is no xrange , but the range function behaves
like xrange If you want to write code that will run on both Python 2 and Python 3, you should
use range().
range() – This returns a list of numbers created using range() function.
xrange() – This function returns the generator object that can be used to display numbers
only by looping. Only particular range is displayed on demand and hence called “lazy
evaluation“.
Both are implemented in different ways and have different characteristics associated with
them. The points of comparisons are:
Return Type
Memory
Operation Usage
Speed