Data structures
Python provides several built-in data structures that allow coders to
organize and manipulate data efficiently. Some common Python
data structures that we commonly use during our data exploration
are:
Lists are mutable sequences
Tuples are immutable sequences
Dictionaries are collections of key-value pairs
Sets are unordered collections of unique elements.
These data structures are versatile and efficient, forming the
foundation for more advanced data handling and analysis.
Mastering their use is crucial for effective Python programming,
enabling the management and transformation of various data
forms with ease.
Source: https://wesmckinney.com/book/
Data structures – Tuples
Definition: Tuples are immutable sequences of ordered elements
in Python. Unlike lists, tuples cannot be modified once created.
They are often used to store related data items that need to
remain constant, such as dates, database records, etc. Tuples are
defined by enclosing elements within parentheses, separated by
commas.
A tuple is an immutable, ordered sequence of elements.
Tuples are created by placing a comma-separated sequence of
values inside parentheses, e.g., my_tuple = (1, 2, 3).
Tuples can store heterogeneous data, meaning they can contain
elements of different data types, e.g., my_tuple = (1, "hello",
3.14, True, [4, 5, 6]).
Tuples Methods
index() : Each item in tuple has an index number that depends
on their order. The first index is 0, the second index is 1, the
third index is 2, etc. In the my_tup tuple, the item '4' is position
(index) 2. To access a specific item from a tuple, we can use the
index number within a square bracket:
my_tup=(1,3,4,5,8)
print(my_tup.index(4))
#output: 2
Try> What happens if you change the index number to 1 or any other
number?
Tuples Methods
count(): Returns the number of occurrences of a value/item.
my_tup=(1,3,4,4,5,4,8)
print(my_tup.index(4))
#output: 3
len() : Returns the number of items in the tuple.
my_tup1 = (10, 20, 30, 40, 30)
print(len(my_tup1))
#output: 5
Tuples Operations
Slicing: Slicing allows you to extract a portion (subsequence) of a
sequence, such as a string, list, or tuple. It offers a concise and flexible
way to access multiple elements from a sequence based on their
indices. You can select sections of most sequence types using slice
notation, which in its basic form is start:stop:step within square
brackets [ ].
my_tuple = (1, "hello", 3.14, True, [4, 5, 6])
print(my_tuple[1:3])
# Output: ("hello", 3.14)
Tuples Operations
Indexing: Access elements by index.
my_tuple = (1, "hello", 3.14, True)
print(my_tuple[1])
# Output: "hello"
Concatenation: Combine tuples using +
con_tup1=(25, 3.0, False, "Green")
con_tup2=(500, ['Blue', 50, 250], {'country':'India'})
con_tup3=con_tup1 + con_tup2
Print(con_tup3)
# Output: (25, 3.0, False, 'Green', 500, ['Blue', 50, 250], {'country': 'India'})
Tuples Operations
Repeat tuples using *
repeated_tuple = my_tuple * 2
print(repeated_tuple)
# Output: (1, "hello", 3.14, True, 1, "hello", 3.14, True)
Check if an item is in the tuple.
is_in_tuple = "hello" in my_tuple
print(is_in_tuple) # Output: True
Tuples Operations
Object inside a tuple is mutable in the case of lists and
dictionaries
tup_list=(500, ['Blue', 50, 250], {'country':'India'})
tup_list[1].append('green')
print(tup_list)
#output: (500, ['Blue', 50, 250, 'green'], {'country': 'India'})
tup_list=(500, ['Blue', 50, 250], {'country':'India'})
tup_list[1][0]=('green')
print(tup_list)
#output: (500, ['green', 50, 250], {'country': 'India'})
Data structures – Lists
Definition: A list is a mutable, ordered sequence of elements. Lists
are created using square brackets, e.g., my_list = [1, 2, 3]. List is a
ordered collections of items, and it can contain items of different
data types. It is mutable, means you can modify their contents
after creation. List support various operations like appending,
inserting, slicing, etc.
A list can store anywhere from zero to millions of items. The items
that can be stored in a list include the data types are: integers,
floats, and strings. a_list = [item1, item2, item3, item4...]
list are mutable, and heterogeneous
list items are ordered (will be in given sequence), changeable,
and allow duplicate values
list items are indexed and indexing/slicing can be done
List methods
append(): Add an element to the end, e.g., my_list.append(4)
2
List methods
extend(): Add multiple elements, e.g., my_list.extend([4, 5])
2
List methods
insert(): Insert an element at a specific position,
e.g., my_list.insert(1, 'a')
2
List methods
remove(): Remove the first occurrence of an element,
e.g., my_list.remove(2)
2
List methods
pop(): Remove and return an element, e.g., my_list.pop()
2
List methods
sort(): Sort the list, e.g., my_list.sort()
2
List methods
reverse(): Reverse the list, e.g., my_list.reverse()
List methods
index(): Find the index of the first occurrence, e.g., my_list.index(3)
2
List methods
count(): Count the occurrences of an element, e.g., my_list.count(2)
List Operations
Indexing: Access elements by index, e.g., my_list[0]
Slicing: Access a range of elements, e.g., my_list[1:3]
Concatenation: Combine lists using +, e.g., my_list + another_list
Repetition: Repeat lists using *, e.g., my_list * 3
Check the presence:
Check if an item is in the list, e.g., 3 in my_list
List Operations
Indexing allows you to access individual elements in a list using their
position (index).
List Operations
Slicing allows you to access a range of elements from a list by
specifying a start and end index.
List Operations
Concatenation allows you to combine two or more lists into one using
the + operator.
List Operations
Repetition allows you to repeat the elements of a list multiple times
using the * operator.
List Operations
‘in’ operator allows you to check if an item is present in a list.
Data structures – Dictionaries
Definition: A dictionary is an unordered collection of key-value
pairs. Each key must be unique and immutable, while values can
be of any type. Dictionaries are defined using curly braces {}, with
key-value pairs separated by colons. Dictionaries are commonly
used for mapping relationships between data items.
• Dictionaries are used to store data values in key:value pairs
• Dictionary is unordered (won't be in given sequence), heterogeneous
and dynamic, and values are mutable
• Dictionaries are written within curly brackets
• Indexing is not possible in dictionaries since they are unordered and
we can access items using their keys.
• Only values can be displayed from key and not key from value
• Dictionary doesn't allow duplicates
d = {'a': 1, 'b': 2}
print(d)
# Output: {'a': 1, 'b': 2}
Dictionaries-Methods
keys() : Returns a view of all keys
d = {'a': 1, 'b': 2}
print(d.keys())
# Output: dict_keys(['a', 'b'])
values(): Returns a view of all values.
d = {'a': 1, 'b': 2}
print(d.values())
# Output: dict_values([1, 2])
Dictionaries-Methods
items(): Returns of key-value pairs.
d = {'a': 1, 'b': 2}
print(d.items())
# Output: dict_items([('a', 1), ('b', 2)])
get(key, default):
Returns the value for key if key is in the dictionary, else default
means None.
d = {'a': 1, 'b': 2}
print(d.get('a', 0)) # Output: 1
print(d.get('c', 0)) # Output: 0
Dictionaries-Methods
pop(key, default) is used in dictionaries to remove and return an
element based on its key. Else default, a value to be returned if the
key is not found. If not provided and the key is missing, a KeyError is
raised
my_dict = {'apple': 3, 'banana': 2, 'orange': 1}
value = my_dict.pop('apple')
print(value) # Output: 3
print(my_dict) # Output: {'banana': 2, 'orange': 1}
Value1 = my_dict.pop(‘grape’, ‘Not found’)
print(value1) # Output: Not found
print(my_dict) # Output: {'apple': 3, 'banana': 2, 'orange': 1}
*Attempt to remove a non-existent key without default will raise a KeyError
Dictionaries-Methods
The update() method is useful for adding many key/value pairs to a
dictionary at once. The update() method accepts a single key/value
pair, multiple pairs, or even other dictionaries.
d = {'a': 1, 'b': 2}
d.update({'b': 3, 'c': 4})
print(d)
# Output: {'b': 3, 'c': 4}
dict1 = {'name': 'John', 'age': 30}
dict2 = {'occupation': 'Engineer', 'location': 'New York'}
dict1.update(dict2)
pprint(dict1)
# Output: {'age': 30, 'location': 'New York', 'name': 'John', 'occupation':
'Engineer'}
Dictionaries-Methods
The del method in dictionary is used to delete objects, it deletes a
specific key-value pair.
my_dict = {'apple': 3, 'banana': 2, 'orange': 1}
del my_dict['apple']
print (my_dict)
#Output: {'banana': 2, 'orange': 1}
Data structures-Sets
A set is an unordered collection of unique items and defined using
curly braces {} or the set() constructor. Sets are mutable, means you
can add or remove items after creating them.
• Unordered (not be in given sequence): Sets do not maintain
any specific order for items, means you cannot access items
using an index.
• Unique items : Each item in a set must be unique. Set may
contain duplicate items but it prints/results only unique
items.
• No Indexing: Since sets are unordered, we cannot access
items by index, and slicing is not supported.
• Mutable (Modifiable): we can modify the content of a set by
adding or removing items.
set_eg = {"Alpha", "Gamma", "Beta"}
print(set_eg)
#Output: {'Beta', 'Alpha', 'Gamma'}
Sets-Methods
The add() method is used to add a single item to a set. It modifies
the set by adding the specified item to the set. If the item is already
in the set, the add() method has no effect, as sets do not allow
duplicate items
add_eg_set = {"Alpha", 500, "Gamma", 845, "Beta", 'Alpha', True}
add_eg_set.add(10)
add_eg_set.add(20)
add_eg_set.add(30)
print (add_eg_set)
#Output: {True, 'Gamma', 10, 845, 'Beta', 20, 30, 500, 'Alpha'}
* Sets are designed to store unique elements. Means adding duplicate elements
to a set has no effect; the set remains unchanged.
Sets-Methods
The update() method is used to update a set with elements from
another set. It modifies the original set by adding items from the
specified set or list.
#update items from one set to another set, using the update() method
set_update0 = {8958, "India", "China", "Srilanka"}
set_update1 = {"USA", "Canada", False, 951}
set_update0.update(set_update1)
print(set_update0)
# Output: {False, 'USA', 'Srilanka', 'China', 'India', 951, 'Canada', 8958}
#update items to a set from a list using update() method
set_update2 = {8958, 9000, 5000, 500}
list_1 = ["USA", "Canada", False, 951]
set_update2.update(list_1)
print(set_update2)
#{False, 'USA', 9000, 5000, 500, 951, 'Canada', 8958}
Sets-Methods
The remove() method will remove a specific item from a set. It
modifies the set in place by removing the specified item from the
set and raises an error if an item does not exist.
set_1 = set((10, 20, 30, 40, 50))
set_1.remove(50)
print(set_1)
#Output: {10, 20, 30, 40}
The pop() method will remove a random item from a set, because
they are unordered.
set_2 = {'Hindi', 'Economics', 'Mathematics'}
set_2.pop() set_2
print(set_2)
#Output: {'Hindi', 'Mathematics'}
Sets-Methods
The discard() method removes a specific item from a set and does
not raise any errors if the item not present in the set.
dis_set = {10, 20, 30, 40, 50}
dis_set.discard(30)
print(dis_set)
#Output: {50, 20, 40, 10}
*Try removing an item that is not present, it will not give any error
The clear() method removes all items from a set and makes it
empty.
set_2 = {'Hindi', 'Economics', 'Mathematics'}
set_2.clear
print(set_2)
#Output: set()
Sets-Methods
The union() method returns a new set that contains all the unique
items from the original sets without modifying the original sets.
set1 = {10, 20, 30, 40}
set2 = {30, 40, 50, 60}
set3 = {50, 60, 70, 70}
union_set = set1.union(set2, set3)
print(union_set)
#Output: {70, 40, 10, 50, 20, 60, 30}
The intersection() method returns a new set containing only the
items that are common with all sets
group_a = {"India", "Iran", "Russia", "China"}
group_b = {"Pakistan", "Burma", "India"}
group_c = {'America', 'Africa', 'India'}
group_common = group_a.intersection(group_b, group_c)
print(group_common)
#Output: {'India'}
Sets-Methods
The difference() is used to find the set difference between two sets.
It returns a new set containing the items that are in the set1 but not
in set2
set1 = {10, 20, 30, 40, 50}
set2 = {30, 40, 60, 70}
diff_set = set1.difference(set2)
print(diff_set)
#Output: {50, 10, 20}
#returns a new set containing the items that are in a set but not in other two
group_a = {"India", "Iran", "Russia", "China"}
group_b = {"Pakistan", "Burma", "India"}
group_c = {'America', 'Africa', 'India'}
group_a_diff = group_a.difference(group_b, group_c)
print(group_a_diff)
#Output: {'Russia', 'Iran', 'China'}