BSC Python 4
BSC Python 4
Python is a popular programming language. It was created by Guido van Rossum, and released
in 1991 at Netherlands.
Python is a general purpose programming language. Means that it can be used to write the
computer programs for various things like games, data science, websites.
Python is portable. We can use python programs in the various different operating systems like
Windows, Mac OS, Linux without any charges.
Python is an interpreted language. Because it goes through an interpreter which turns code you
write into the language understood by your computer’s processor.
Python is strongly typed languages. Strongly typed languages don’t convert data from one type
to another type automatically.
Python has huge set of libraries. A python library is a collection of programs which you can
incorporate into your own program without writing code for them.
Used for
Web development
Software development
Mathematics
System scripting
Expressions
An expression is anything that has a value. Example 3, 4+5, “Hello”.
4+5
9
An expression is a combination of operators and operands that is interpreted to produce some
other value.
1) Constant Expressions: These are the expressions that have constant values only.
x = 15 + 1.3
print(x)
16.3
# Arithmetic Expressions
x = 40
y = 12
add = x + y
sub = x - y
pro = x * y
div = x / y
print(add)
print(sub)
print(pro)
print(div)
52
28
480
3.3333333333333335
3) Integral Expressions: These are the kind of expressions that produce only integer results after
all computations and type conversions.
# Integral Expressions
a = 13
b = 12.0
c = a + int(b)
print(c)
25
4) Floating Expressions: These are the kind of expressions which produce floating point numbers
as result after all computations and type conversions.
# Floating Expressions
a = 13
b = 5
c = a / b
print(c)
2.6
# Relational Expressions
a = 21
b = 13
c = 40
d = 37
p = (a + b) >= (c - d)
print(p)
True
6) Logical Expressions: These are kinds of expressions that result in either True or False. It
basically specifies one or more conditions. For example, (10 == 9) is a condition if 10 is equal to 9.
As we know it is not correct, so it will return False.
P = (10 == 9)
Q = (7 > 5)
# Logical Expressions
R = P and Q
S = P or Q
T = not P
print(R)
print(S)
print(T)
False
True
True
7) Bitwise Expressions: These are the kind of expressions in which computations are performed
at bit level.
# Bitwise Expressions
a = 12
x = a >> 2
y = a << 1
print(x, y)
3 24
# Combinational Expressions
a = 16
b = 12
c = a + (b >> 1)
print(c)
22
Statements
A statement is an instruction to the computer to perform a task or a series of Python statements.
Statements can contain one or more expressions and can be single or multiple lines.
Comments
Comments can be used to explain Python code.
2 types of comments
sdfgrgerf "rere"
# Multiple line strings
Str1 = """Hello World,
This is an era of Computer's life"""
print(Str1)
Hello World,
This is an era of Computer's life
Data Types
1) Numbers
Integers (int)
2) String
3) Booleans
Examples (Try)
str
Verifying object data type
Syntax: type(object) is datatype
Assigning values to multiple variables, Getting the datatype, Verifying object data type, Convert
the datatype of an object to another
True
int()
float()
str()
Examples (Try)
int
a="1"
type(a)
b=int(a)
type(b)
int
a="R"
type(a)
b=int(a)
type(b)
----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
Input In [16], in <cell line: 3>()
1 a="R"
2 type(a)
----> 3 b=int(a)
4 type(b)
Variables
Variables are named containers for storing data values.
del a
del b,c
print(a)
----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [18], in <cell line: 1>()
----> 1 print(a)
and, as, assert, break, class, continue, def, del, elif, else, except, finally, false, for, from, global, if,
import, in, is, lambda, nonlocal, None, not, or, pass, raise, return, True, try, with, while, yield.
2) There are several sequence types: strings, Unicode strings, lists, tuples, arrays and range
objects
3) Dictionaries and sets are containers for non-sequential and unordered data.
Strings
Working with Strings
spam='That is Alice's cat'
spam
Input In [20]
spam='That is Alice's cat'
^
SyntaxError: invalid syntax
Escape Characters
\t Tab
\ Backslash
Hello there!
How are you?
I 'm doing fine. Thank you.
What are you \doing?
Sincerely,
Bob
''')
Dear Alice
Eve's cat has been arrested for catnapping, cat burglary and
extortion.
Sincerely,
Bob
Dear Alice
Eve's cat has been arrested for catnapping, cat burglary and
extortion.
Sincerely,
Bob
Indexing
'g'
'n'
----------------------------------------------------------------------
-----
IndexError Traceback (most recent call
last)
Input In [33], in <cell line: 1>()
----> 1 strSample[-9]
strSample[2]
'a'
learning
strSample[:] # learning
'learning'
strSample[[:]]
Input In [37]
strSample[[:]]
^
SyntaxError: invalid syntax
strSample[1:]
'earning'
strSample[1::2]
'erig'
strSample[::-1]
'gninrael'
strSample[1::]
'earning'
strSample[1:-1:2]
'eri'
strSample[1:-1]
'earnin'
name='A1'
age=4000
'Hello, my name is %s. I am %s years old.' %(name,age)
name='A1'
age=4000
f'Hello, my name is {name}. Next year, I will be {age+1} years old.'
learning python
learning
learning python
learning
'learning python'
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [53], in <cell line: 4>()
2 str1 = "How old"
3 str2 = 45
----> 4 print(str1+str2)
# But we can add string and integer after converting the integer into
string
str1 = "How old "
str2 = 45
str3 = str(45)
print(str1+str3)
How old 45
name = "Jose"
str1 = "How are you {name}"
print(str1.format(name="Jose"))
str1 = "How are you {name}"
print(str1.format(name="Bob"))
name = "Jose"
str1 = "How are you {name}"
print(str1.format(name=name))
frd_name = "Jose"
str1 = "How are you {name}"
print(str1.format(name=frd_name))
name1 = 'GITAA'
name2 = 'Pvt'
name3 = 'Ltd'
Multiplication
learning
learninglearninglearning
'Hello' in 'Hello'
True
False
'' in 'spam'
True
False
String methods
learning is fun !
strSample
strSample.find('z')
-1
Python string isalnum() function returns True if it’s made of alphanumeric characters only. A
character is alphanumeric if it’s either an alpha or a number. If the string is empty, then isalnum()
returns False.
The isalnum() method returns True if all the characters are alphanumeric, meaning alphabet
letter (a-z) and numbers (0-9).
strSample.isalnum()
# Return true if all bytes in the sequence are alphabatical ASCII
characters or ASCII decimal digits, false otherwise
False
s = 'HelloWorld2019'
print(s.isalnum())
True
False
s = ''
print(s.isalnum())
False
s='A.B'
print(s.isalnum())
s = '10.50'
print(s.isalnum()) # The string contains period (.) which is
not an alphanumeric character.
False
False
s = 'çåøÉ'
print(s.isalnum()) # True because all these are Alpha
characters.
True
learning is fun !
spam='Hello, world'
spam.islower()
False
spam.isupper()
False
'world'.islower()
True
'HELLO'.isupper()
True
'abc123'.islower()
True
'123'.islower()
False
'123'.isupper()
False
'Hello'.upper()
'HELLO'
'Hello'.upper().lower()
'hello'
'Hello'.upper().lower().upper()
'HELLO'
'HELLO'.lower()
'hello'
'HELLO'.lower().islower()
True
isX() Methods
isalpha()
isalnum()
isdecimal()
isspace()
istitle()
'hello'.isalpha()
True
'hello123'.isalpha()
False
'hello123'.isalnum()
True
'hello'.isalnum()
True
'123'.isdecimal()
True
' '.isspace()
True
True
True
False
False
'Hello, world'.startswith('Hello')
True
'Hello, world'.endswith('world')
True
'abc123'.startswith('abcdef')
False
'abc123'.endswith('12')
False
True
True
Partition
'Hello, world'.partition('w')
'Hello, world'.partition('world')
'Hello, world'.partition('o')
Reverse
strSample.reverse()
strSample # AttributeError: 'str'
object has no attribute 'reverse'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [123], in <cell line: 1>()
----> 1 strSample.reverse()
2 strSample
Length
len(strSample)
17
Clear
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [125], in <cell line: 1>()
----> 1 strSample.clear()
Append or Add
strSample.append('sas') # AttributeError: 'str'
object has no attribute 'append'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [126], in <cell line: 1>()
----> 1 strSample.append('sas')
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [127], in <cell line: 1>()
----> 1 strSample.add('sas')
Update
strSample.update('a')
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [128], in <cell line: 1>()
----> 1 strSample.update('a')
Insert
strSample.insert(3,5)
print(strSample) # AttributeError: 'str'
object has no attribute 'insert'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [129], in <cell line: 1>()
----> 1 strSample.insert(3,5)
2 print(strSample)
AttributeError: 'str' object has no attribute 'insert'
pop(): removes the element at the given index from the object and prints the same
learning is fun !
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [131], in <cell line: 1>()
----> 1 strSample.pop()
Remove
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [132], in <cell line: 1>()
----> 1 strSample.remove(0)
del
learning is fun !
del strSample
print(strSample)
----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [135], in <cell line: 1>()
----> 1 print(strSample)
Insert
learning is fun !
strSample.extend(('hello'))
print(strSample) # TypeError: an integer is
required (got type str)
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [137], in <cell line: 1>()
----> 1 strSample.extend(('hello'))
2 print(strSample)
Lists
A list is a value that contains multiple values in an ordered sequence.
[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 3, 3, 4, 5]
Indexing
'a'
print(lstSample[1,-1,2])
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [148], in <cell line: 1>()
----> 1 print(lstSample[1,-1,2])
lstSample = [1,2,"a",'sam',2]
print(lstSample[1:-1:2])
[2, 'sam']
print(lstSample[1::2])
[2, 'sam']
print(lstSample[2:4])
['a', 'sam']
print(lstSample[::-1])
Concatenation
Multiplication
lstSample*2
lstSample*=2
lstSample
lstSample[2]*=2
lstSample
lstSample[1]*=2
lstSample
lstSample[1]+=2
lstSample
[1, 6, 'aa', 'sam', 2, 1, 2, 'a', 'sam', 2]
lstSample[2]+=2
lstSample # TypeError: can only
concatenate str (not "int") to str
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [161], in <cell line: 1>()
----> 1 lstSample[2]+=2
2 lstSample
lstSample
lstSample[2:4]*=2
lstSample
len
len(lstSample)
lstSample = [1,2,'a','sam',2]
lstSample
lstSample['a']='b'
lstSample
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [167], in <cell line: 1>()
----> 1 lstSample['a']='b'
2 lstSample
reverse
clear
lstSample.clear()
print(lstSample)
[]
append or add
lstSample.append([4,5,6])
print(lstSample)
lstSample.add([4,5,6])
print(lstSample) # AttributeError: 'list'
object has no attribute 'add'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [174], in <cell line: 1>()
----> 1 lstSample.add([4,5,6])
2 print(lstSample)
update
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [176], in <cell line: 1>()
----> 1 lstSample.update(['a','3'])
2 print(lstSample)
# So we need to convert list into set and then pass update function
l=[4,'3']
s = set(l)
l1 = set(lstSample)
l1.update(s)
print(l1)
insert
pop: removes the element at the given index from the object and prints the same
lstSample.pop()
print(lstSample)
[1, 2, 'a', 'sam']
lstSample.pop(2)
'a'
print(lstSample)
[1, 2, 'sam', 2]
The remove() method removes the first occurrence of the element with the specified value
['a', 'sam', 2]
['a', 2]
[1, 2, 'sam', 2]
The extend() method adds the specified list elements (or any iterable - list, set, tuple, etc.) to the
end of the current list
lstSample.extend([4,5,3,5])
print(lstSample)
spam[0]
'cat'
spam[2]
'rat'
'elephant'
'Hello cat'
'The ' + spam[1] + ' ate the ' + spam[0] + '.'
spam[10000]
----------------------------------------------------------------------
-----
IndexError Traceback (most recent call
last)
Input In [204], in <cell line: 1>()
----> 1 spam[10000]
spam[1]
'bat'
spam[1.0]
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [207], in <cell line: 1>()
----> 1 spam[1.0]
spam[int(1.0)]
'bat'
spam[0]
['cat', 'bat']
spam[0][1]
'bat'
spam[1][4]
50
'elephant'
spam[0:4]
spam[1:3]
['bat', 'rat']
spam[0:-1]
spam[:2]
['cat', 'bat']
spam[1:]
spam[:]
spam[2] = spam[1]
spam
spam[-1] = 12345
spam
spam = [1, 2, 3]
spam = spam + ['A', 'B', 'C']
spam
del spam[2]
spam
['cat', 'bat']
True
'cat' in spam
False
False
True
bacon = ['Zophie']
bacon *= 3
bacon
spam.index('hello')
0
spam = ['Zophie', 'Pooka', 'Fat-tail', 'Pooka']
spam.index('Pooka')
----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
Input In [244], in <cell line: 2>()
1 spam = ['cat', 'bat', 'rat', 'elephant']
----> 2 spam.remove('mouse')
3 spam
[-7, 1, 2, 3.14, 5]
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [248], in <cell line: 2>()
1 spam = [1, 3, 2, 4, 'Alice', 'Bob']
----> 2 spam.sort()
Tuples
# Create a Python Tuple called animals
animals = ("bear", "dog", "giraffe", "goat", "leopard", "lion",
"penguin", "tiger")
animals
type(animals)
tuple
'goat'
# Items can also be accessed by negative index
print(animals[-2])
penguin
print(animals[2:5])
animals[2:-2:2]
('giraffe', 'leopard')
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [257], in <cell line: 2>()
1 # Tuples are immutable
----> 2 animals[0]='kangaroo'
bear
dog
giraffe
goat
leopard
lion
penguin
tiger
print(len(animals))
animals
('bear', 'dog', 'giraffe', 'goat', 'leopard', 'lion', 'penguin',
'tiger')
# Remove items from a list: There are a few ways to remove items.
animals.pop(3)
# animals.remove("goat")
# del animals[3]
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [262], in <cell line: 2>()
1 # Remove items from a list: There are a few ways to remove
items.
----> 2 animals.pop(3)
animals.remove("goat")
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [263], in <cell line: 1>()
----> 1 animals.remove("goat")
del animals[3]
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [264], in <cell line: 1>()
----> 1 del animals[3]
animals
Animal_1 = list(animals)
Animal_1
Animal_1.pop(3)
'goat'
Animal_1
Animal_2 = tuple(Animal_1)
Animal_2
(1, 2, 3, 4, 3, 'py')
(1, 2, 'sample')
(1, 2, 3, 4, 3, 'py')
Indexing
Slicing
tupSample = (1,2,3,4,3,'py') # tuple
tupSample
(1, 2, 3, 4, 3, 'py')
print(tupSample[:-1:2])
(1, 3, 3)
Addition
(1, 2, 3, 4, 3, 'py')
tupSample+=('th','on')
print(tupSample)
tupSample+('th','on')
print(tupSample)
Multiplication
(1, 2, 3, 4, 3, 'py')
tupSample*2
tupSample[2:4]*2
(3, 4, 3, 4)
len(object)
len(tupleSample)
reverse
tupSample = (1,2,3,4,3,'py') # tuple
print(tupSample)
(1, 2, 3, 4, 3, 'py')
tupSample.reverse()
print(tupSample) # AttributeError: 'tuple' object
has no attribute 'reverse'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [288], in <cell line: 1>()
----> 1 tupSample.reverse()
2 print(tupSample)
clear
tupSample.clear()
print(tupSample) # AttributeError: 'tuple'
object has no attribute 'clear'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [289], in <cell line: 1>()
----> 1 tupSample.clear()
2 print(tupSample)
append() or add()
tupSample.append((1,2))
print(tupSample) # AttributeError:
'tuple' object has no attribute 'append'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [290], in <cell line: 1>()
----> 1 tupSample.append((1,2))
2 print(tupSample)
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [291], in <cell line: 1>()
----> 1 tupSample.add((1,2))
2 print(tupSample)
update()
(1, 2, 3, 4, 3, 'py')
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [293], in <cell line: 1>()
----> 1 tupSample.update((7,'8'))
s1 = set(tupSample)
print(s1)
s2 = set((7,'8'))
print(s2)
s1.update(s2)
print(s1)
{1, 2, 3, 4, 'py'}
{'8', 7}
{1, 2, 3, 4, 'py', 7, '8'}
insert()
(1, 2, 3, 4, 3, 'py')
strSample.insert(3,5)
print(strSample) # AttributeError: 'str'
object has no attribute 'insert'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [296], in <cell line: 1>()
----> 1 strSample.insert(3,5)
2 print(strSample)
pop()
(1, 2, 3, 4, 3, 'py')
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [298], in <cell line: 1>()
----> 1 tupSample.pop()
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [299], in <cell line: 1>()
----> 1 tupSample.pop(2)
remove()
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [301], in <cell line: 1>()
----> 1 tupSample.remove(2)
del
----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [302], in <cell line: 2>()
1 del tupSample # deleting the tuple,
tupSample
----> 2 tupSample
----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [303], in <cell line: 1>()
----> 1 del tupSample[2] # deleting the 3rd item
in the tuple
2 tupSample
extend()
# Practical question
import calendar as c
Year=int(input("Enter the year: "))
Month=int(input("Enter the month: "))
print(c.month(Year,Month))
----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
Input In [304], in <cell line: 3>()
1 # Practical question
2 import calendar as c
----> 3 Year=int(input("Enter the year: "))
4 Month=int(input("Enter the month: "))
5 print(c.month(Year,Month))
Dictionary
Dictionary - Sequence of key-value pairs - {}
dictSample[2] # KeyError: 2 -
indexing by values is not present in dictionary
----------------------------------------------------------------------
-----
KeyError Traceback (most recent call
last)
Input In [306], in <cell line: 1>()
----> 1 dictSample[2]
KeyError: 2
dictSample[1]
'first'
dictSample['second'] # to find
the value corresponds to key 'second'
3
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [309], in <cell line: 1>()
----> 1 dictSample.index(1)
for x in dictSample:
print(x)
1
second
3
four
for x in dictSample.keys():
print(x)
1
second
3
four
for x in dictSample.values():
print(x)
first
3
3
4
for x in dictSample.items():
print(x)
(1, 'first')
('second', 3)
(3, 3)
('four', '4')
'first'
dictSample[1::2]
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [316], in <cell line: 1>()
----> 1 dictSample[1::2]
dictSample[1:'second'] #
TypeError: unhashable type: 'slice'
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [317], in <cell line: 1>()
----> 1 dictSample[1:'second']
dictSample[1:] #
TypeError: unhashable type: 'slice'
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [318], in <cell line: 1>()
----> 1 dictSample[1:]
dictSample['four']
'4'
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [320], in <cell line: 1>()
----> 1 dictSample+{5:3}
TypeError: unsupported operand type(s) for +: 'dict' and 'dict'
dictSample,{5:3},{3:4}
dictSample,2
dictSample,(1)
dictSample,[2]
dictSample,range(2,7,2)
dictSample,{1,2}
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [327], in <cell line: 1>()
----> 1 dictSample*2
reverse
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [328], in <cell line: 1>()
----> 1 dictSample.reverse()
dictSample.clear()
print(dictSample)
{}
append/add
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [331], in <cell line: 1>()
----> 1 dictSample.append({5:5})
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [332], in <cell line: 1>()
----> 1 dictSample.add({5:5})
update
dictSample.update({7:"John"})
print(dictSample)
dictSample['second']=9
dictSample
dictSample["five"] = 5
print(dictSample)
dictSample.update(1=5)
print(dictSample) # SyntaxError: expression cannot
contain assignment, perhaps you meant "=="?
Input In [341]
dictSample.update(1=5)
^
SyntaxError: expression cannot contain assignment, perhaps you meant
"=="?
len
dictSample.get(3)
dict_values(['first', 2, 3, 4])
insert
dictSample = {1:'first','second':2, 3:3, 'four':4} #
dictionary
print(dictSample)
dictSample.insert(3,5)
printi(dictSample) # AttributeError: 'dict'
object has no attribute 'insert'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [354], in <cell line: 1>()
----> 1 dictSample.insert(3,5)
2 printi(dictSample)
pop
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [356], in <cell line: 1>()
----> 1 dictSample.pop()
dictSample.pop('second')
print(dictSample)
dictSample.pop(3)
3
print(dictSample)
remove
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [362], in <cell line: 1>()
----> 1 dictSample.remove(1)
del
----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [364], in <cell line: 2>()
1 del dictSample # deleting the dictionary,
dictSample
----> 2 dictSample
}
# Keys : Values
countries_cities
'Rome'
countries_cities["Rome"]
----------------------------------------------------------------------
-----
KeyError Traceback (most recent call
last)
Input In [370], in <cell line: 1>()
----> 1 countries_cities["Rome"]
KeyError: 'Rome'
'Cairo'
# Change value in a dictionary
countries_cities["United States"]="Atlanta"
countries_cities
countries_cities
print(countries_cities)
del countries_cities["Egypt"]
countries_cities
('Italy', 'Rome')
countries_cities
countries_cities = {
"United Kingdom" : "London",
"United States" : "New York",
"Belgium" : "Brussels",
"Croatia" : "Zagreb",
"Egypt" : "Cairo",
"Italy" : "Rome",
countries_cities
United Kingdom
United States
Belgium
Croatia
Egypt
Italy
London
New York
Brussels
Zagreb
Cairo
Rome
for x in countries_cities.items(): # print keys and values
print(x)
family
print(family)
family["secondchild"]
family
{}
Question: Create tic-tac board with their corresponding keys using dictionary
theBoard = {'top-L': ' ', 'top-M': ' ', 'top-R': ' ',
'mid-L': ' ', 'mid-M': ' ', 'mid-R': ' ',
'low-L': ' ', 'low-M': ' ', 'low-R': ' '}
theBoard
theBoard = {'top-L': ' ', 'top-M': ' ', 'top-R': ' ',
'mid-L': ' ', 'mid-M': 'X', 'mid-R': ' ',
'low-L': ' ', 'low-M': ' ', 'low-R': ' '}
theBoard
{'top-L': 'O',
'top-M': 'O',
'top-R': 'O',
'mid-L': 'X',
'mid-M': 'X',
'mid-R': ' ',
'low-L': ' ',
'low-M': ' ',
'low-R': 'X'}
theBoard = {'top-L': ' ', 'top-M': ' ', 'top-R': ' ',
'mid-L': ' ', 'mid-M': ' ', 'mid-R': ' ',
'low-L': ' ', 'low-M': ' ', 'low-R': ' '}
def printBoard(board):
print(board['top-L'] + '|' + board['top-M'] + '|' + board['top-
R'])
print('-+-+-')
print(board['mid-L'] + '|' + board['mid-M'] + '|' + board['mid-
R'])
print('-+-+-')
print(board['low-L'] + '|' + board['low-M'] + '|' + board['low-
R'])
printBoard(theBoard)
| |
-+-+-
| |
-+-+-
| |
O|O|O
-+-+-
X|X|
-+-+-
| |X
Set
Sets - Sequence of unordered collection of unique data
setSample[1]
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [402], in <cell line: 1>()
----> 1 setSample[1]
setSample.index("example")
# TypeError: 'set'object does not support indexing
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [403], in <cell line: 1>()
----> 1 setSample.index("example")
setSample[1:2] #
TypeError: 'set' object is not subscriptable
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [404], in <cell line: 1>()
----> 1 setSample[1:2]
setSample=setSample,24 # Converts to
tuple with comma separated elements of set, dict, range
print(setSample)
setSample=setSample,{1:2} # Converts to
tuple with comma separated elements of set, dict, range
print(setSample)
setSample,{33}
setSample,range(1,10,2)
(((({24, 87.5, 'data', 'example'}, 24), {1: 2}), [1]), range(1, 10,
2))
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [411], in <cell line: 1>()
----> 1 dictSample+{5:3}
dictSample,{5:3}
dictSample,2
dictSample,(1)
dictSample,[2]
dictSample,range(2,7,2)
Multiplication
setSample = {'example',24,87.5,'data',24,'data'}
# sets
setSample
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [418], in <cell line: 1>()
----> 1 setSample*2
len
len(setSample)
reverse
setSample.reverse() #
AttributeError: 'set' object has no attribute 'reverse'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [421], in <cell line: 1>()
----> 1 setSample.reverse()
clear
setSample.clear()
print(setSample)
set()
append() or add()
setSample.append(20)
print(setSample) # AttributeError: 'set'
object has no attribute 'append'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [424], in <cell line: 1>()
----> 1 setSample.append(20)
2 print(setSample)
setSample.add(20)
print(setSample)
update
ss = set(strSample)
print(ss)
l = set('John')
print(l)
ss.update(l)
print(ss)
{'!', 'e', 'r', 'u', 'n', 's', ' ', 'g', 'f', 'i', 'l', 'a'}
{'n', 'o', 'J', 'h'}
{'!', 'e', 'r', 'J', 'h', 'u', 'n', 's', ' ', 'g', 'f', 'i', 'l', 'o',
'a'}
ss = set(strSample)
print(ss)
l = set('john')
print(l)
ss.update(l)
print(ss)
{'!', 'e', 'r', 'u', 'n', 's', ' ', 'g', 'f', 'i', 'l', 'a'}
{'n', 'j', 'o', 'h'}
{'!', 'e', 'r', 'h', 'u', 'n', 's', ' ', 'j', 'g', 'f', 'i', 'l', 'o',
'a'}
lstSample = [1,2,'a','sam',2]
# So we need to convert list into set and then pass update function
l=[4,'3']
s = set(l)
l1 = set(lstSample)
l1.update(s)
print(l1)
tupSample = (1,2,3,4,3,'py')
s1 = set(tupSample)
print(s1)
s2 = set((7,'8'))
print(s2)
s1.update(s2)
print(s1)
{1, 2, 3, 4, 'py'}
{'8', 7}
{1, 2, 3, 4, 'py', 7, '8'}
dictSample.update({7:"John"})
print(dictSample)
setSample = {'example',24,87.5,'data',24,'data'}
setSample.update({7,"John"})
print(setSample)
insert
setSample.insert(3,5)
print(setSample) # AttributeError: 'set'
object has no attribute 'insert'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [439], in <cell line: 1>()
----> 1 setSample.insert(3,5)
2 print(setSample)
pop
setSample.pop()
24
print(setSample)
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [443], in <cell line: 1>()
----> 1 setSample.pop(2) # Set is an unordered
sequence and hence pop is not usually used
2 print(setSample)
remove
setSample.remove('example')
setSample
----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [448], in <cell line: 2>()
1 del setSample # deleting the set,
setSample
----> 2 setSample
----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [449], in <cell line: 1>()
----> 1 del setSample[2] # deleting the set,
setSample
2 setSample
Set operations
{24, 87.5}
print(A|B) # union of A and B is a set of all elements from
both sets
A.union(B) # using union() on B
{24, 87.5}
{24, 87.5}
for x in arrSample:
print(x) # printing values of array
rangeSample = range(1,12,4) #
built-in sequence type used for looping
print(rangeSample)
for x in rangeSample:
print(x) # print
the values of 'rangeSample'
range(1, 12, 4)
1
5
9
Indexing
arrSample
array('i', [1, 2, 3, 4])
arrSample.index(4)
rangeSample
range(1, 12, 4)
----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
Input In [460], in <cell line: 1>()
----> 1 rangeSample.index(0)
----------------------------------------------------------------------
-----
IndexError Traceback (most recent call
last)
Input In [463], in <cell line: 1>()
----> 1 rangeSample[9]
Slicing
arrSample[1:]
array('i', [2, 3, 4])
arrSample[1:-1]
arrSample[1:-1:2]
array('i', [2])
for x in rangeSample[:-1]:
print(x)
1
5
for x in rangeSample[1:-1]:
print(x)
Concatenation
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [469], in <cell line: 1>()
----> 1 arrSample+[50,60]
arrSample+array('i',[50,60])
range(2,7,2)+range(2,7,2)
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [471], in <cell line: 1>()
----> 1 range(2,7,2)+range(2,7,2)
Multiplication
from array import *
arrSample = array('i',[1,2,3,4]) # array with integer type
print(arrSample)
arrSample*2
rangeSample = range(1,12,4)
rangeSample
range(1, 12, 4)
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [475], in <cell line: 1>()
----> 1 rangeSample*2
len
len(arrSample)
rangeSample = range(1,12,4)
rangeSample
range(1, 12, 4)
len(rangeSample)
clear
arrSample.clear() # AttributeError:
'array.array' object has no attribute 'clear'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [480], in <cell line: 1>()
----> 1 arrSample.clear()
rangeSample.clear()
print(rangeSample) # AttributeError: 'range'
object has no attribute 'clear'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [481], in <cell line: 1>()
----> 1 rangeSample.clear()
2 print(rangeSample)
append() or add()
arrSample.add(3)
print(arrSample) # AttributeError: 'array.array'
object has no attribute 'add'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [482], in <cell line: 1>()
----> 1 arrSample.add(3)
2 print(arrSample)
arrSample.append(3)
print(arrSample) # AttributeError: 'array.array'
object has no attribute 'add'
rangeSample.append(22) # AttributeError:
'range' object has no attribute 'append'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [484], in <cell line: 1>()
----> 1 rangeSample.append(22)
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [485], in <cell line: 1>()
----> 1 rangeSample.add(22)
Can we change list into tuple, tuple into string, list into dictionary, list into set, tuple into
dictionary, tuple into set, list into string, and vice-versa.
l1=[1,2,3,]
l1
# type(l1)
[1, 2, 3]
t1=(1,2,3,)
t1
# type(t1)
(1, 2, 3)
Can we take keys and values as string, tuple, list, set, dictionary?
dict1={'Hi':[1,2,3],(1,2):(1,2),5:{'End'},2:{1:2,3:3}}
dict1
{'Hi': [1, 2, 3], (1, 2): (1, 2), 5: {'End'}, 2: {1: 2, 3: 3}}
update
arrSample = array('i',[1,2,3,4])
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [490], in <cell line: 1>()
----> 1 arrSample.update(7)
s1 = set((7,6))
print(s1)
l2 = set(arrSample)
print(l2)
l2.update(s1) # AttributeError: 'array.array' object has
no attribute 'update'
print(l2)
{6, 7}
{1, 2, 3, 4}
{1, 2, 3, 4, 6, 7}
rangeSample.update(range(1,10,2))
print(rangeSample) # AttributeError: 'range'
object has no attribute 'update'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [492], in <cell line: 1>()
----> 1 rangeSample.update(range(1,10,2))
2 print(rangeSample)
s1 = set(range(1,12,4))
print(s1)
s2 = set(range(1,20,3))
print(s2)
s1.update(s2)
print(s1)
{1, 5, 9}
{1, 4, 7, 10, 13, 16, 19}
{1, 4, 5, 7, 9, 10, 13, 16, 19}
insert
range(1, 12, 4)
rangeSample.insert(3,5)
printi(rangeSample) # AttributeError: 'range'
object has no attribute 'insert'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [498], in <cell line: 1>()
----> 1 rangeSample.insert(3,5)
2 printi(rangeSample)
pop
arrSample
print(arrSample)
range(1, 12, 4)
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [506], in <cell line: 1>()
----> 1 rangeSample.pop()
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [507], in <cell line: 1>()
----> 1 rangeSample.pop(2)
remove
arrSample
range(1, 12, 4)
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [512], in <cell line: 1>()
----> 1 rangeSample.remove(4)
del
arrSample = array('i',[1,2,3,4])
arrSample
----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [514], in <cell line: 2>()
1 del arrSample # deleting the array,
arrSample
----> 2 arrSample
arrSample = array('i',[1,2,3,4])
arrSample
rangeSample = range(1,12,4)
rangeSample
range(1, 12, 4)
----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [518], in <cell line: 2>()
1 del rangeSample # deleting the range,
rangeSample
----> 2 rangeSample
rangeSample = range(1,12,4)
rangeSample
range(1, 12, 4)
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [520], in <cell line: 1>()
----> 1 del rangeSample[2] # deleting the range,
rangeSample
2 rangeSample
extend
arrSample.extend(('hello'))
print(arrSample) # TypeError: an integer is
required (got type str)
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [524], in <cell line: 1>()
----> 1 arrSample.extend(('hello'))
2 print(arrSample)
arrSample.extend(['john'])
print(arrSample) # TypeError: an integer is
required (got type str)
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [525], in <cell line: 1>()
----> 1 arrSample.extend(['john'])
2 print(arrSample)
range(1, 12, 4)
rangeSample.extend([4,5,3,5])
print(rangeSample)
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [530], in <cell line: 1>()
----> 1 rangeSample.extend([4,5,3,5])
2 print(rangeSample)
Matrices
m = [[4,1,2],[7,5,3],[9,6,9]]
for i in m:
print(i)
[4, 1, 2]
[7, 5, 3]
[9, 6, 9]
m = [[4,1,2],[7,5,3],[9,6,9]]
for i in m:
print(' '.join(str(i)))
[ 4 , 1 , 2 ]
[ 7 , 5 , 3 ]
[ 9 , 6 , 9 ]
[ 4 , 8 , 9 ]
[ 4 , 8 , 9 ]
[ 4 , 8 , 9 ]
lst = []
m = []
for i in range(0,3):
for j in range(0,3):
lst.append(0)
m.append(lst)
lst = []
for i in m:
print(' '.join(str(i)))
[ 0 , 0 , 0 ]
[ 0 , 0 , 0 ]
[ 0 , 0 , 0 ]
for i in range(3):
print(i)
0
1
2
for i in range(1,3):
print(i)
1
2
# Initialize matrix
matrix = []
print("Enter the entries row wise:")
print(matrix)
row = column = 1
X[row][column] = 11
print(X)
row = -2
column = -1
X[row][column] = 21
print(X)
for r in result:
print(r)
print("Matrix Addition")
for r in Add_result:
print(r)
print("\nMatrix Subtraction")
for r in Sub_result:
print(r)
Matrix Addition
[10, 10, 10]
[10, 10, 10]
[10, 10, 10]
Matrix Subtraction
[-8, -6, -4]
[-2, 0, 2]
[4, 6, 8]
rmatrix = [[0, 0, 0], [0, 0, 0], [0, 0, 0]]
print("Matrix Multiplication",)
for r in rmatrix:
print(r)
for i in range(len(X)):
for j in range(len(X[0])):
rmatrix[row][column] = X[row][column] // Y[row][column]
print("\nMatrix Division",)
for r in rmatrix:
print(r)
Matrix Multiplication
[9, 16, 21]
[24, 25, 24]
[21, 16, 9]
Matrix Division
[9, 16, 21]
[24, 25, 24]
[21, 16, 9]
# Transpose
X = [[9, 8, 7], [6, 5, 4], [3, 2, 1]]
for r in result:
print(r)
[9, 6, 3]
[8, 5, 2]
[7, 4, 1]
Python Operators
# Python Arithmetic Operators
x = 7
y = 5
print(x+y) # Addition Operator
print(x-y) # Subtraction Operator
print(x*y) # Multiplication Operator
print(x/y) # Division Operator
print(x%y) # Modulo Operator
print(x**y) # Exponent Operator
print(x//y) # Floor division
# a|b i.e., if a divides b then b = n*a + r; where n = a//b, and r = a
%b.
12
2
35
1.4
2
16807
1
x = 7
x += 3
print(x)
x = 7
x -= 3
print(x)
x = 7
x -= 3
print(x)
x = 7
x *= 3
print(x)
x = 7
x /= 3
print(x)
x = 7
x %= 3
print(x)
London
10
4
4
21
2.3333333333333335
1
x = 7
y = 3
print(x==y)
x = 7
y = 3
print(x!=y)
x = 7
y = 3
print(x>y)
x = 7
y = 3
print(x<y)
x = 7
y = 3
print(x>=y)
x = 7
y = 3
print(x<=y)
False
True
True
False
True
False
x = 7
print(x>3 and x<10)
x = 7
print(x>3 or x<10)
x = 7
print(not(x>3 and x<10))
True
True
False
Conditional Statements
# if statement
a = 40
b = 80
if b>a:
print("b is greater than a")
b is greater than a
# elif statement
a = 40
b = 40
if b>a:
print("b is greater than a")
elif a==b:
print("a and b are equal")
# else statement
a = 20
b = 10
if b>a:
print("b is greater than a")
elif a==b:
print("a and b are equal")
else:
print("a is greater than b")
a is greater than b
# Nested if statements
x=51
if x>10:
print("Above ten,")
if x>20:
print("Above 20!")
else:
print("but not above 20.")
Above ten,
Above 20!
Python Loops
# While loop
i=1
while i<7:
print(i)
i+=1
1
2
3
4
5
6
# The break statement: we can stop the loop even if the while
condition is true:
i=1
while i<7:
print(i)
if (i==4):
break
i+=1
1
2
3
4
1
2
3
5
6
7
# The continue statement: we can stop the current iteration, and
continue with the next:
i=0
while i<7:
i+=1
if i==4:
pass
print(i)
1
2
3
4
5
6
7
# For loops
fruits=["apple","banana","cherry","kiwi","oranges"]
for x in fruits:
print(x)
apple
banana
cherry
kiwi
oranges
for x in "strawberry":
print(x)
s
t
r
a
w
b
e
r
r
y
adj=["red","big","tasty"]
fruits=["apple","banana","cherry"]
for x in adj:
for y in fruits:
print(x,y)
red apple
red banana
red cherry
big apple
big banana
big cherry
tasty apple
tasty banana
tasty cherry
Enter a number: 47
Number is Prime
ss=sentence.split()
print(ss)
{'t': 1, 'h': 1, 'e': 2, ' ': 7, 'q': 0, 'u': 1, 'i': 0, 'c': 0, 'k':
0, 'b': 0, 'r': 1, 'o': 3, 'w': 0, 'n': 0, 'f': 0, 'x': 0, 'j': 0,
'm': 0, 'p': 0, 's': 0, 'v': 0, 'l': 0, 'a': 0, 'z': 0, 'y': 0, 'd':
0, 'g': 0}
{'t': 2, 'h': 2, 'e': 3, ' ': 8, 'q': 1, 'u': 2, 'i': 1, 'c': 1, 'k':
1, 'b': 1, 'r': 2, 'o': 4, 'w': 1, 'n': 1, 'f': 1, 'x': 1, 'j': 1,
'm': 1, 'p': 1, 's': 1, 'v': 1, 'l': 1, 'a': 1, 'z': 1, 'y': 1, 'd':
1, 'g': 1}
5%2
s=0
for i in range(1,11):
s=s+i
print(s)
55
s=0
l=[]
for i in range(1,11):
s=s+i
l.append(s)
print(l)
{'T': 1, 'h': 2, 'e': 2, ' ': 2, 'S': 2, 'u': 1, 'n': 2, 'i': 1, 's':
1}
T: 1
h: 2
e: 2
: 2
S: 2
u: 1
n: 2
i: 1
s: 1
The frequency of S in the sentence: 2
# GCD of 2 numbers
num1 = 36
num2 = 60
gcd = 1
GCD of 36 and 60 is 12
# Palindrome
str1 = input('Enter your number: ')
str2 = str1[::-1]
l1 = list(str1)
l2 = list(str2)
if l1==l2:
print("Palindrome number")
else:
print("Not a Palindrome number")
string=input(("Enter a letter:"))
if(string==string[::-1]):
print("The letter is a palindrome")
else:
print("The letter is not a palindrome")
Enter a letter:12321
The letter is a palindrome
reverse = 0
temp = number
if(number == reverse):
print("%d is a Palindrome" %number)
else:
print("%d is not a Palindrome" %number)
Python Functions
A function accepts input arguments and produces an output by executing valid commands
present in the function
Function name and file names need not be the same
Functions are created using the command def and a colon with the statements to be executed
indented as a block
Since statements are not demarcated explicitly, It is essential to follow correct indentation
practices
Syntax:
print((lambda x:x*x)(12))
144
x=(lambda a,b:a*b)(4,5)
print(x)
20
n=4
print((lambda p:p+n)(7))
11
y=lambda p,q:(p*p)+(q*q)
print(y(5,6))
61
list=[1,2,3]
l=[]
for i in list: # range(len(list)):
x=lambda i:i*i*i
l.append(x(i))
print(l)
[1, 8, 27]
# Swap 2 numbers
# a=2
# b=3
def swap(a,b):
temp=a
a=b
b=temp
print("values after swapping:",a,b)
swap(23,25)
values after swapping: 25 23
a = 9
b = 10
print(a,b)
a,b = b,a
print(a,b)
9 10
10 9
# 5*1=5
# .
# .
# 5*10=50
def mult(num):
for i in range(1,11):
num1=num*i
print(num,"*",i,"=",num1)
print(mult(5))
5 * 1 = 5
5 * 2 = 10
5 * 3 = 15
5 * 4 = 20
5 * 5 = 25
5 * 6 = 30
5 * 7 = 35
5 * 8 = 40
5 * 9 = 45
5 * 10 = 50
None
def prime(num):
if (num==1):
return False
elif (num==2):
return True
else:
for x in range(2,num):
if (num%x==0):
return False
return True
print(prime(4))
# for i in range(2,n/2,2):
# if n%i==0:
# print("Is not prime")
# i=i+1
# print("It is a prime")
# prime(4)
False
Enter a number : 5
No. is prime
No. is prime
No. is prime
if (stg[i]==stg[i].lower()):
count1=count1+1
elif (stg[i]==stg[i].upper()):
count2=count2+1
print("The count of lower strings ", count1)
print("The count of upper strings ", count2)
ULCase("AbSc")
def uclc(a):
uc=''
lc=''
for i in range(len(a)):
if (a[i].isupper()==True):
uc=uc+a[i]
elif (a[i].islower()==True):
lc=lc+a[i]
print(len(uc))
print(len(lc))
uclc("ABCdefGHIjk")
6
5
def UP_LO(s):
d={"UPPER_CASE":0,"LOWER_CASE":0}
for c in s:
if c.isupper():
d["UPPER_CASE"]+=1
elif c.islower():
d["LOWER_CASE"]+=1
else:
pass
print("Original String:",s)
print("No. of Upper case Letters:",d["UPPER_CASE"])
print("No. of Lower case Letters:",d["LOWER_CASE"])
Enter1234
['1', '2', '3', '4']
Enter3456
['3', '4', '5', '6']
It is a substring
# Driver Code
list1 = [[2, 3, 1], [4, 5], [6, 8]]
list2 = [[4, 5], [6, 8]]
print(checkSubset(list1, list2))
True
Enter1234
Entersd3
False
list2[0][1]
for i in a:
if i in f:
f[i]+=1
else:
f[i]=1
print(str(f))
# Fibonacci Function
def fib(n):
if n<0:
return("Fibonacci sequence don't exist for negative numbers. \
nPlease provide a positive number. ")
elif n==0:
return 1
elif n==1:
return [1,1]
else:
a0=1
a1=1
l=[a0,a1]
for i in range(1,n):
a2=a0+a1
a0=a1
a1=a2
l.append(a2)
return l
# Fibonacci Function
def fib(n):
if n<0:
return("Fibonacci sequence don't exist for negative numbers. \
nPlease provide a positive number. ")
elif n==0:
return 1
elif n==1:
return [1,1]
else:
a0=1
a1=1
l=[a0,a1]
i=2
while (i<=n):
a2=a0+a1
a0=a1
a1=a2
i+=1
l.append(a2)
return l
def palindrome(str1):
# str1=input("Enter your text: ")
# str1=str(str1)
l=len(str1)
str2=''
while l>0:
j=str1[l-1]
str2+=j
l-=1
return str2
if str1==str2:
return("Palindrome number")
else:
return("Not Palindrome number")
n=input("Enter your text: ")
print(palindrome(n))
n=int(input("Enter :"))
n1,n2=1,1
count=0
if n<=0:
print("Enter a positive number.")
elif n==1:
print("Fibonacci sequence upto",n,":")
print(n1)
else:
print("Fibonacci sequence :")
while count<=n:
print(n1)
nth=n1+n2
n1=n2
n2=nth
count+=1
Enter :5
Fibonacci sequence :
1
1
2
3
5
8
vcount = 0
ccount = 0
str = "This is a really simple sentence"
vowel = set("aeiouAEIOU")
#Converting entire string to lower case to reduce the comparisons
# str = str.lower();
for i in range(0,len(str)):
#Checks whether a character is a vowel
if str[i] in ('a',"e","i","o","u"):
vcount = vcount + 1;
elif (str[i] >= 'a' and str[i] <= 'z'):
ccount = ccount + 1;
print("Total number of vowel and consonant are" );
print(vcount);
print(ccount);
No. of vowels : 10
No. of consonants : 22
my_dict = {'apple': 10, 'banana': 5, 'cherry': 15}
def sorted_values(d):
return sorted(d.values(), reverse=True)
sorted_value_list = sorted_values(my_dict)
sorted_value_list
[15, 10, 5]
sorted_key_list = sorted_keys(my_dict)
sorted_key_list
def string_lengths(strings):
return [len(s) for s in strings]
[5, 6, 6]
Functions
A function is a block of code which only runs when it is called.
def my_function():
print("Hello from a function")
my_function()
Parameters
A parameter is the variable listed inside the parentheses in the function definition. For example:
a and b
Arguments
An argument is the value that is sent to the function when it is called. For example: 23 and 45.
Arguments are specified after the function name, inside the parentheses. You can add as many
arguments as you want, just separate them with a comma.
The following example has a function with one argument (fname). When the function is called,
we pass along a first name, which is used inside the function to print the full name:
def my_function(fname):
print(fname + " Refsnes")
my_function("Emil")
my_function("Tobias")
my_function("Linus")
Emil Refsnes
Tobias Refsnes
Linus Refsnes
Number of Arguments
By default, a function must be called with the correct number of arguments. Meaning that if your
function expects 2 arguments, you have to call the function with 2 arguments, not more, and not
less.
my_function("Emil", "Refsnes")
Emil Refsnes
If you try to call the function with 1 or 3 arguments, you will get an error:
my_function("Emil")
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [5], in <cell line: 4>()
1 def my_function(fname, lname):
2 print(fname + " " + lname)
----> 4 my_function("Emil")
TypeError: my_function() missing 1 required positional argument:
'lname'
my_function("Emil","Ajay",'Anil')
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [7], in <cell line: 4>()
1 def my_function(fname, lname):
2 print(fname + " " + lname)
----> 4 my_function("Emil","Ajay",'Anil')
If you do not know how many arguments that will be passed into your function, add a * before
the parameter name in the function definition.
This way the function will receive a tuple of arguments, and can access the items accordingly:
def my_function(*kids):
print("The youngest child is " + kids[2])
Keyword Arguments
You can also send arguments with the key = value syntax.
If you do not know how many keyword arguments that will be passed into your function, add
two asterisk: ** before the parameter name in the function definition.
This way the function will receive a dictionary of arguments, and can access the items
accordingly:
def my_function(**kid):
print("His last name is " + kid["lname"])
my_function("Sweden")
my_function("India")
my_function()
my_function("Brazil")
I am from Sweden
I am from India
I am from Norway
I am from Brazil
You can send any data types of argument to a function (string, number, list, dictionary etc.), and
it will be treated as the same data type inside the function.
E.g. if you send a List as an argument, it will still be a List when it reaches the function:
def my_function(food):
for x in food:
print(x)
my_function(fruits)
apple
banana
cherry
Return Values
print(my_function(3))
print(my_function(5))
print(my_function(9))
15
25
45
function definitions cannot be empty, but if you for some reason have a function definition with
no content, put in the pass statement to avoid getting an error.
def myfunction():
pass
myfunction()
# myfunction('a')
PYTHON FUNCTIONS-TYPES
● Built-in Functions.
● Recursion Functions.
● Lambda Functions.
● User-defined Functions.
● type()
● print()
● abs()
● int()
● str()
● tuple()
● chr()
Recursion is a common mathematical and programming concept. It means that a function calls
itself.
def factorial(x):
if x==1:
return 1
else:
return (x*factorial(x-1))
num=int(input("Enter a number: "))
print("The factorial of",num,'is',factorial(num))
Enter a number: 4
The factorial of 4 is 24
A lambda function can take any number of arguments, but can only have one expression.
Syntax:
x = lambda a : a + 10
print(x(5))
15
x = lambda a, b : a * b
print(x(5, 6))
30
x = lambda a, b, c : a + b + c
print(x(5, 6, 2))
13
def myfunc(n):
return lambda a : a * n
mydoubler = myfunc(2)
print(mydoubler(11))
22
def myfunc(n):
return lambda a : a * n
mydoubler = myfunc(2)
mytripler = myfunc(3)
print(mydoubler(11))
print(mytripler(11))
22
33
A Python library is a set of related modules or packages bundled together. It is used by the
programmers as well as the developers. For example: Popular built-in Python libraries include
Pygame, Pytorch, matplotlib, and more.
Real-world programs are complicated. Even simple software contains thousands of lines of
code. Because of this, writing code in continuous flow is difficult for programmers and
developers to grasp. Developers utilize modular programming to facilitate learning and make it
logically separated. It is a method of breaking down huge coding tasks into shorter, more logical,
and more adaptable subtasks.
Python's ease of use is one of its primary goals. Python has so many modules and libraries
because of this.
NumPy
NumPy is a Python library (package) used for working with arrays.
It also has functions for working in domain of linear algebra, fourier transform, and matrices.
NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it
freely.
Supports N-dimensional array objects that can be used for processing multidimensional data
In Python we have lists that serve the purpose of arrays, but they are slow to process.
NumPy aims to provide an array object that is up to 50x faster than traditional Python lists.
The array object in NumPy is called ndarray, it provides a lot of supporting functions that make
working with ndarray very easy.
Arrays are very frequently used in data science, where speed and resources are very important.
Installation of NumPy
Import NumPy
import numpy as np
NumPy is used to work with arrays. The array object in NumPy is called ndarray.
print(arr)
print(type(arr))
[1 2 3 4 5]
<class 'numpy.ndarray'>
print(np.__version__)
1.22.4
print(arr)
print(type(arr))
[1 2 3 4 5]
<class 'numpy.ndarray'>
time
import time
a = time.time()
for i in range(1,100000): i*1000
b = time.time()
c=b-a
print(a)
print(b)
print(c)
1704103059.5630674
1704103059.5816789
0.018611431121826172
import time
l = []
a = time.time()
for i in range(0, 100):
l.append(i)
b = time.time()
c = b - a
End time
1703847721.5496702
Start time
1703847721.5496702
Execution time
0.0
Dimensions in Arrays
0-D Arrays
0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array.
arr = np.array(42)
print(arr)
42
1-D Arrays
An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.
print(arr)
[1 2 3 4 5]
2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array.
print(arr)
[[1 2 3]
[4 5 6]]
3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.
print(arr)
print(type(arr))
# print(len(arr))
# print(arr.ndim)
# print(arr.shape)
[[[1 2 3]
[4 5 6]]
[[1 2 3]
[4 5 6]]]
<class 'numpy.ndarray'>
NumPy Arrays provides the ndim attribute that returns an integer that tells us how many
dimensions the array have.
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)
0
1
2
3
print(arr)
print('number of dimensions :', arr.ndim)
[[[[[[[[[[1 2 3 4]]]]]]]]]]
number of dimensions : 10
Shape of an Array
NumPy arrays have an attribute called shape that returns a tuple with each index having the
number of corresponding elements.
print(arr.shape)
(3, 4)
print(arr)
print('shape of array :', arr.shape)
[1 2 3 4]
shape of array : (4,)
arr = np.array([1, 2, 3, 4], ndmin=1)
print(arr)
print('shape of array :', arr.shape)
[1 2 3 4]
shape of array : (4,)
print(arr)
print('shape of array :', arr.shape)
[[1 2 3 4]]
shape of array : (1, 4)
print(arr)
print('shape of array :', arr.shape)
[[[1 2 3 4]]]
shape of array : (1, 1, 4)
print(arr)
print('shape of array :', arr.shape)
[[[[[1 2 3 4]]]]]
shape of array : (1, 1, 1, 1, 4)
The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the
second has index 1 etc.
print(arr[0])
print(arr[1])
print(arr[2] + arr[3])
1
2
7
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr[0, 1, 2])
Slicing arrays
Slicing in python means taking elements from one given index to another given index.
print(arr[1:5])
print(arr[4:])
print(arr[:4])
print(arr[-3:-1])
print(arr[1:5:2])
print(arr[::2])
[2 3 4 5]
[5 6 7]
[1 2 3 4]
[5 6]
[2 4]
[1 3 5 7]
print(arr[1, 1:4])
print(arr[0:2, 2])
print(arr[0:2, 1:4])
[7 8 9]
[3 8]
[[2 3 4]
[7 8 9]]
strings - used to represent text data, the text is given under quote
marks. e.g. "ABCD"
NumPy has some extra data types, and refer to data types with one character, like i for integers,
u for unsigned integers etc.
Below is a list of all data types in NumPy and the characters used to represent them.
i - integer
b - boolean
u - unsigned integer
f - float
c - complex float
m - timedelta
M - datetime
O - object
S - string
U - unicode string
The NumPy array object has a property called dtype that returns the data type of the array:
print(arr.dtype)
int32
print(arr.dtype)
<U6
print(arr)
print(arr.dtype)
print(arr)
print(arr.dtype)
print(arr)
print(arr.dtype)
[-101 2 3 4]
int16
print(arr)
print(arr.dtype)
[1 2 3 4]
int32
arr = np.array([1, 2, 3, 4], dtype='i8')
print(arr)
print(arr.dtype)
[1 2 3 4]
int64
print(arr)
print(arr.dtype)
[1 2 3 4]
int32
1 byte = 8 bits (in ASCII encoding, where ASCII: American Standard Code for Information
Interchange)
1 word = 2 bytes
It starts with u and it has 8, 16, 32, 64, and 128-bit. The minimum and maximum values are from
0 to 2ⁿ-1.
DATATYPE>---->MIN>---->MAX>---->LENGTH
u8>---->0>---->255>---->8-bit
u16>---->0>---->65535>---->16-bit
& so on.
It starts with i and it has 8, 16, 32, 64, and 128-bit. The minimum and maximum values are from -
(2ⁿ⁻¹) to 2ⁿ⁻¹-1.
DATATYPE>---->MIN>---->MAX>---->LENGTH
i8>---->-128>---->127>---->8-bit
i16>---->-32768>---->32767>---->16-bit
& so on.
bits value
0000 0
0001 1
0010 2
0011 3
0100 4
0101 5
0110 6
0111 7
1000 8
1001 9
1010 10
1011 11
1100 12
1101 13
1110 14
1111 15
bits value
0000 0
0001 1
0010 2
0011 3
0100 4
0101 5
0110 6
0111 7
1000 -8
1001 -7
1010 -6
1011 -5
1100 -4
1101 -3
1110 -2
1111 -1
To understand the differences between byte string and Unicode string, we first need to know
what “Encoding” and “Decoding” are.
To store the human-readable characters on computers, we need to encode them into bytes. In
contrast, we need to decode the bytes into human-readable characters for representation. Byte,
in computer science, indicates a unit of 0/1, commonly of length 8. So characters “Hi” are
actually stored as “01001000 01101001” on the computer, which consumes 2 bytes (16-bits).
The rule that defines the encoding process is called encoding schema, commonly used ones
include “ASCII”, “UTF-8”, etc.
“ASCII” converts each character into one byte. Since one byte consisted of 8 bits and each bit
contains 0/1. The total number of characters “ASCII” can represent is 2⁸=256.
However, 256 characters are obviously not enough for storing all the characters in the world. In
light of that, people designed Unicode in which each character will be encoded as a “code point”.
For instance, “H” will be represented as code point “U+0048”.
print(arr)
print(arr.dtype)
[1 0 1 2 3 4]
uint8
print(arr)
print(arr.dtype)
[1 2 3 4]
uint64
print(arr)
print(arr.dtype)
[1. 2. 3. 4.]
float32
print(arr)
print(arr.dtype)
[1 2 3 4]
timedelta64
print(arr)
print(arr.dtype)
[1 2 3 4]
object
print(arr)
print(arr.dtype)
The Unicode standard describes how characters are represented by code points. A code point
value is an integer in the range 0 to 0x10FFFF
print(arr)
print(arr.dtype)
----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
Input In [127], in <cell line: 1>()
----> 1 arr = np.array(['a', '2', '3'], dtype='i')
The best way to change the data type of an existing array, is to make a copy of the array with the
astype() method.
The astype() function creates a copy of the array, and allows you to specify the data type as a
parameter.
The data type can be specified using a string, like 'f' for float, 'i' for integer etc. or you can use the
data type directly like float for float and int for integer.
newarr = arr.astype('i')
print(arr)
print(arr.dtype)
print(newarr)
print(newarr.dtype)
newarr = arr.astype(int)
print(newarr)
print(newarr.dtype)
[1 2 3]
int32
newarr = arr.astype(bool)
print(newarr)
print(newarr.dtype)
The main difference between a copy and a view of an array is that the copy is a new array, and
the view is just a view of the original array.
The copy owns the data and any changes made to the copy will not affect original array, and any
changes made to the original array will not affect the copy.
The view does not own the data and any changes made to the view will affect the original array,
and any changes made to the original array will affect the view.
print(arr)
print(x)
[42 2 3 4 5]
[1 2 3 4 5]
print(arr)
print(x)
[42 2 3 4 5]
[42 2 3 4 5]
import numpy as np
import copy
Reshaping arrays
newarr = arr.reshape(4, 3)
print(arr)
print(newarr)
[ 1 2 3 4 5 6 7 8 9 10 11 12]
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(2, 3, 2)
print(arr)
print(newarr)
[ 1 2 3 4 5 6 7 8 9 10 11 12]
[[[ 1 2]
[ 3 4]
[ 5 6]]
[[ 7 8]
[ 9 10]
[11 12]]]
newarr = arr.reshape(3, 3)
print(newarr)
----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
Input In [149], in <cell line: 4>()
1 # Can We Reshape Into any Shape?
2 arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
----> 4 newarr = arr.reshape(3, 3)
6 print(newarr)
print(newarr)
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
newarr = arr.reshape(-1)
print(newarr)
[1 2 3 4 5 6]
newarr = arr.reshape(-1)
print(newarr)
[1 2 3 4]
Iterating Arrays
As we deal with multi-dimensional arrays in numpy, we can do this using basic for loop of
python.
for x in arr:
print(x)
1
2
3
for x in arr:
print(x)
[1 2 3]
[4 5 6]
for x in arr:
for y in x:
print(y)
1
2
3
4
5
6
[[1 2 3 4]]
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
for x in arr:
print(x)
[[1 2 3]
[4 5 6]]
[[ 7 8 9]
[10 11 12]]
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
for x in arr:
for y in x:
for z in y:
print(z)
1
2
3
4
5
6
7
8
9
10
11
12
(0,) 1
(1,) 2
(2,) 3
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
(0, 0) 1
(0, 1) 2
(0, 2) 3
(0, 3) 4
(1, 0) 5
(1, 1) 6
(1, 2) 7
(1, 3) 8
(0, 0, 0) 1
(0, 0, 1) 2
(0, 0, 2) 3
(0, 1, 0) 4
(0, 1, 1) 5
(0, 1, 2) 6
(1, 0, 0) 1
(1, 0, 1) 2
(1, 0, 2) 3
(1, 1, 0) 4
(1, 1, 1) 5
(1, 1, 2) 6
In SQL we join tables based on a key, whereas in NumPy we join arrays by axes.
We pass a sequence of arrays that we want to join to the concatenate() function, along with the
axis. If axis is not explicitly passed, it is taken as 0.
import numpy as np
arr1 = np.array([1, 2, 3])
arr3: [1 2 3 4 5 6]
arr4: [1 2 3 4 5 6]
print(arr3)
print(arr4)
print(arr3.ndim)
print(arr4.ndim)
print(arr3.shape)
print(arr4.shape)
[[1 2 5 6]
[3 4 7 8]]
[[1 2]
[3 4]
[5 6]
[7 8]]
2
2
(2, 4)
(4, 2)
print(arr)
[[1 2]
[3 4]
[5 6]
[7 8]]
Stacking is same as concatenation, the only difference is that stacking is done along a new axis.
We can concatenate two 1-D arrays along the second axis which would result in putting them
one over the other, ie. stacking.
We pass a sequence of arrays that we want to join to the stack() method along with the axis. If
axis is not explicitly passed it is taken as 0.
[[1 4]
[2 5]
[3 6]]
[[1 2 3]
[4 5 6]]
[[1 2 3]
[4 5 6]]
print(arr3)
print(arr4)
[[[1 2]
[5 6]]
[[3 4]
[7 8]]]
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
[[[ 1 2 3]
[ 4 5 6]]
[[ 7 8 9]
[10 11 12]]
[[13 14 15]
[16 17 18]]]
[[[ 1 2 3]
[ 7 8 9]
[13 14 15]]
[[ 4 5 6]
[10 11 12]
[16 17 18]]]
[[[ 1 7 13]
[ 2 8 14]
[ 3 9 15]]
[[ 4 10 16]
[ 5 11 17]
[ 6 12 18]]]
print(arr)
[1 2 3 4 5 6]
print(arr)
[[1 2 3]
[4 5 6]]
print(arr)
[[[1 4]
[2 5]
[3 6]]]
Joining merges multiple arrays into one and Splitting breaks one array into multiple.
We use array_split() for splitting arrays, we pass it the array we want to split and the number of
splits.
newarr = np.array_split(arr, 3)
print(newarr)
newarr = np.array_split(arr, 4)
print(newarr)
Note: We also have the method split() available but it will not adjust the elements when
elements are less in source array for splitting like in example above, array_split() worked
properly but split() would fail.
If you split an array into 3 arrays, you can access them from the result just like any array element:
newarr = np.array_split(arr, 3)
print(newarr)
print(newarr[0])
print(newarr[1])
print(newarr[2])
newarr = np.array_split(arr, 3)
print(newarr)
[array([[1, 2],
[3, 4]]), array([[5, 6],
[7, 8]]), array([[ 9, 10],
[11, 12]])]
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13,
14, 15], [16, 17, 18]])
newarr = np.array_split(arr, 3)
print(newarr)
[array([[1, 2, 3],
[4, 5, 6]]), array([[ 7, 8, 9],
[10, 11, 12]]), array([[13, 14, 15],
[16, 17, 18]])]
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13,
14, 15], [16, 17, 18]])
print(newarr)
[array([[ 1],
[ 4],
[ 7],
[10],
[13],
[16]]), array([[ 2],
[ 5],
[ 8],
[11],
[14],
[17]]), array([[ 3],
[ 6],
[ 9],
[12],
[15],
[18]])]
Searching Arrays
You can search an array for a certain value, and return the indexes that get a match.
x = np.where(arr == 4)
print(x)
# Which means that the value 4 is present at index 3, 5, and 6.
x = np.where(arr%2 == 0)
print(x)
x = np.where(arr%2 == 1)
print(x)
Sorting Arrays
The NumPy ndarray object has a function called sort(), that will sort a specified array.
print(np.sort(arr))
[0 1 2 3]
print(np.sort(arr))
import numpy as np
arr = np.array([2,1,'banana', 'cherry', 'apple',True,1.2,0.9])
print(np.sort(arr))
print(np.sort(arr))
print(np.sort(arr))
[[2 3 4]
[0 1 5]]
Filtering Arrays
Getting some elements out of an existing array and creating a new array out of them is called
filtering.
If the value at an index is True that element is contained in the filtered array, if the value at that
index is False that element is excluded from the filtered array.
print(newarr)
[41 43]
# Create a filter array that will return only values higher than 42:
import numpy as np
newarr = arr[filter_arr]
print(filter_arr)
print(newarr)
# Create a filter array that will return only even elements from the
original array:
import numpy as np
newarr = arr[filter_arr]
print(filter_arr)
print(newarr)
# Create a filter array that will return only values higher than 42:
import numpy as np
newarr = arr[filter_arr]
print(filter_arr)
print(newarr)
# Create a filter array that will return only even elements from the
original array:
import numpy as np
filter_arr = arr % 2 == 0
newarr = arr[filter_arr]
print(filter_arr)
print(newarr)
x = random.randint(100)
# y = np.randint(1)
print(x)
# print(y)
98
x = random.rand()
# x=np.random.rand()
print(x)
0.9675639543128254
x = random.rand(5)
y = np.round(x,3)
print(x)
print(y)
x=random.randint(100, size=(5))
y=random.randint(100, size=5)
z=random.randint(100)
print(x)
print(y)
print(z)
[41 74 9 2 5]
[37 48 37 76 43]
29
print(x)
[[ 4 47 17 31 79]
[81 58 15 18 2]
[59 29 7 54 74]]
x = random.rand(5)
print(x)
x = random.rand(3, 5)
print(x)
The choice() method allows you to generate a random value based on an array of values.
The choice() method takes an array as a parameter and randomly returns one of the values.
x = random.choice([3, 5, 7, 9])
print(x)
[[0.1 3. 3. 3. 0.1]
[3. 0.1 3. 0.1 0.1]
[7. 7. 3. 9. 3. ]]
x.astype(int)
array([[0, 3, 3, 3, 0],
[3, 0, 3, 0, 0],
[7, 7, 3, 9, 3]])
ar1 = np.array([20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40])
print(ar1)
[20 22 24 26 28 30 32 34 36 38 40]
Output: array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18., 20.])
np.linspace(0,20)
np.linspace(0,20,4)
np.arange(0, 20)
np.arange(0, 20, 4)
ar1 = np.linspace(20,40,11)
print(ar1)
[20. 22. 24. 26. 28. 30. 32. 34. 36. 38. 40.]
ar1 = np.arange(20,41,2)
print(ar1)
[20 22 24 26 28 30 32 34 36 38 40]
ar1[1] = 0
print(ar1)
[20 0 24 26 28 30 32 34 36 38 40]
# Print the data type of the elements of ar1 using the ‘dtype’
command.
print(np.dtype(ar1[0]))
print(np.dtype(ar1[1]))
print(np.dtype(ar1[2]))
print(np.dtype(ar1[3]))
print(np.dtype(ar1[4]))
print(np.dtype(ar1[5]))
int32
int32
int32
int32
int32
int32
# Use the ‘any’ function to check if any of the elements in ar1 are
zero or not.
np.any(ar1)
True
np.all(ar1)
False
len(ar1)
11
np.ndim(ar1)
# Print the largest and smallest value from ar1 using built-in
functions.
ar1.max()
40
ar1.min()
# Print mean and median of ar1 using ‘mean’ and ‘median’ functions
respectively.
np.mean(ar1)
28.0
np.median(ar1)
30.0
np.std(ar1)
10.583005244258363
np.var(ar1)
112.0
# Print the sum and product of all elements of ar1 using the built-in
functions.
np.sum(ar1)
308
np.product(ar1)
# Other operations:
np.sort(ar1)
array([ 0, 20, 24, 26, 28, 30, 32, 34, 36, 38, 40])
# import numpy
import numpy as np
a = [1, 2, 2, 4, 3, 6, 4, 8]
# using np.unique() method
b = np.unique(a)
[1 2 3 4 6 8]
[1 2 3 4 6 8]
import numpy as np
arr = np.array([1, 2, 2, 3, 4], ndmin=3)
arr
array([[[1, 2, 2, 3, 4]]])
print(b)
np.flip(ar1)
array([40, 38, 36, 34, 32, 30, 28, 26, 24, 0, 20])
np.count_nonzero(ar1)
10
# Vectorization
# Create and store a vector F = [32, 38, 40, 28, 56, 65, 70] which
contains temperatures(Fahrenheit). Print F.
[32 38 40 28 56 65 70]
[ 0 3 4 -3 13 18 21]
np.square(ar1)
array([ 400, 0, 576, 676, 784, 900, 1024, 1156, 1296, 1444,
1600])
# Find Sin, Cos and Tan of each element in F. Print the results.
print(c1, "\n")
print(c2, "\n")
print(c3, "\n")
F>50
F[F>50]
F[np.logical_and(F % 8 == 0, F % 4 == 0)]
F[np.logical_or(F % 8 == 0, F % 4 == 0)]
F[F % 4 != 0]
[[5 0 4]
[2 3 2]
[1 2 1]]
[[0 1 2]
[1 2 3]
[3 1 1]]
c1 = np.matrix([[1, 2], [3, 4], [5, 6]])
print(c1)
[[1 2]
[3 4]
[5 6]]
[[5 0 4]
[2 3 2]
[1 2 1]]
[[0 1 2]
[1 2 3]
[3 1 1]]
[[1 2]
[3 4]
[5 6]]
print(a * b)
[[0 0 8]
[2 6 6]
[3 2 1]]
print(a1 * b1)
[[12 9 14]
[ 9 10 15]
[ 5 6 9]]
np.ndim(a)
np.ndim(b)
2
np.ndim(c)
# Print the number of rows and columns in A, B and C using the ‘shape’
attribute.
np.shape(a)
(3, 3)
type(np.shape(a))
tuple
Number of rows in a: 3
Number of columns in a: 3
Number of rows in b: 3
Number of columns in b: 3
Number of rows in c: 3
Number of columns in c: 2
np.transpose(a)
array([[5, 2, 1],
[0, 3, 2],
[4, 2, 1]])
np.transpose(b)
array([[0, 1, 3],
[1, 2, 1],
[2, 3, 1]])
np.transpose(c)
array([[1, 3, 5],
[2, 4, 6]])
array([5, 3, 1])
np.diagonal(b)
array([0, 2, 1])
# Print A + B and A − B.
print(a + b)
[[5 1 6]
[3 5 5]
[4 3 2]]
print(a - b)
[[ 5 -1 2]
[ 1 1 -1]
[-2 1 0]]
i = np.eye(3, dtype=int)
print(i)
[[1 0 0]
[0 1 0]
[0 0 1]]
np.dot(a, i)
array([[5, 0, 4],
[2, 3, 2],
[1, 2, 1]])
a * b
array([[0, 0, 8],
[2, 6, 6],
[3, 2, 1]])
b * a
array([[0, 0, 8],
[2, 6, 6],
[3, 2, 1]])
a * c
----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
Input In [311], in <cell line: 1>()
----> 1 a * c
b * c
----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
Input In [312], in <cell line: 1>()
----> 1 b * c
np.dot(a, b)
array([[12, 9, 14],
[ 9, 10, 15],
[ 5, 6, 9]])
np.dot(b, a)
array([[ 4, 7, 4],
[12, 12, 11],
[18, 5, 15]])
np.dot(a, c)
array([[25, 34],
[21, 28],
[12, 16]])
np.dot(b, c)
array([[13, 16],
[22, 28],
[11, 16]])
# Print the sum of all elements in matrices A, B and C using the ‘sum’
function.
np.sum(a)
20
np.sum(b)
14
np.sum(c)
21
np.trace(a)
np.trace(b)
np.trace(c)
a.flatten()
array([5, 0, 4, 2, 3, 2, 1, 2, 1])
b.flatten()
array([0, 1, 2, 1, 2, 3, 3, 1, 1])
c.flatten()
array([1, 2, 3, 4, 5, 6])
# Print the sum of rows and sum of columns for all matrices A, B and C
using the ‘sum’ function.
# [Hint: Use axis = 0 and axis = 1]
Pandas
Pandas is a Python library used for working with data sets.
The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was
created by Wes McKinney in 2008.
Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Data Science: is a branch of computer science where we study how to store, use and analyze
data for deriving information from it.
Max value?
Min value?
Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or
NULL values. This is called cleaning the data.
Installation of Pandas
import pandas as pd
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
# mydataset
myvar = pd.DataFrame(mydataset)
print(myvar)
cars passings
0 BMW 3
1 Volvo 7
2 Ford 2
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2],
'price':["x","10",20]
}
# mydataset
myvar = pd.DataFrame(mydataset)
print(myvar)
print(pd.__version__)
1.4.2
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
0 1
1 7
2 2
dtype: int64
a = [1, 7, 2]
myvar = pd.Series(a)
myvar1=pd.DataFrame(myvar)
print(myvar)
print(myvar1)
0 1
1 7
2 2
dtype: int64
0
0 1
1 7
2 2
print(myvar[-1])
----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
File ~\anaconda3\lib\site-packages\pandas\core\indexes\range.py:385,
in RangeIndex.get_loc(self, key, method, tolerance)
384 try:
--> 385 return self._range.index(new_key)
386 except ValueError as err:
The above exception was the direct cause of the following exception:
File ~\anaconda3\lib\site-packages\pandas\core\series.py:958, in
Series.__getitem__(self, key)
955 return self._values[key]
957 elif key_is_scalar:
--> 958 return self._get_value(key)
960 if is_hashable(key):
961 # Otherwise index.get_value will raise InvalidIndexError
962 try:
963 # For labels that don't resolve as scalars like tuples
and frozensets
File ~\anaconda3\lib\site-packages\pandas\core\series.py:1069, in
Series._get_value(self, label, takeable)
1066 return self._values[label]
1068 # Similar to Index.get_value, but we do not fall back to
positional
-> 1069 loc = self.index.get_loc(label)
1070 return self.index._get_values_for_loc(self, loc, label)
File ~\anaconda3\lib\site-packages\pandas\core\indexes\range.py:387,
in RangeIndex.get_loc(self, key, method, tolerance)
385 return self._range.index(new_key)
386 except ValueError as err:
--> 387 raise KeyError(key) from err
388 self._check_indexing_error(key)
389 raise KeyError(key)
KeyError: -1
print(myvar[2])
print(myvar)
print(myvar1)
print(myvar["y"])
x 1
y 7
z 2
dtype: int64
0
x 1
y 7
z 2
7
print(myvar[7])
----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
File ~\anaconda3\lib\site-packages\pandas\core\indexes\range.py:385,
in RangeIndex.get_loc(self, key, method, tolerance)
384 try:
--> 385 return self._range.index(new_key)
386 except ValueError as err:
The above exception was the direct cause of the following exception:
File ~\anaconda3\lib\site-packages\pandas\core\series.py:958, in
Series.__getitem__(self, key)
955 return self._values[key]
957 elif key_is_scalar:
--> 958 return self._get_value(key)
960 if is_hashable(key):
961 # Otherwise index.get_value will raise InvalidIndexError
962 try:
963 # For labels that don't resolve as scalars like tuples
and frozensets
File ~\anaconda3\lib\site-packages\pandas\core\series.py:1069, in
Series._get_value(self, label, takeable)
1066 return self._values[label]
1068 # Similar to Index.get_value, but we do not fall back to
positional
-> 1069 loc = self.index.get_loc(label)
1070 return self.index._get_values_for_loc(self, loc, label)
File ~\anaconda3\lib\site-packages\pandas\core\indexes\range.py:387,
in RangeIndex.get_loc(self, key, method, tolerance)
385 return self._range.index(new_key)
386 except ValueError as err:
--> 387 raise KeyError(key) from err
388 self._check_indexing_error(key)
389 raise KeyError(key)
KeyError: 7
print(myvar)
print(myvar1)
print(myvar)
day1 420
day2 380
day3 390
dtype: int64
myvar = pd.Series(calories)
print(myvar)
print(myvar)
day1 420
day2 380
dtype: int64
print(myvar)
1 NaN
2 NaN
dtype: float64
What is a DataFrame?
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with
rows and columns.
print(df)
calories duration
0 420 50
1 380 40
2 390 45
Locate Row
As you can see from the result above, the DataFrame is like a table with rows and columns.
Pandas use the loc attribute to return one or more specified row(s)
# Return row 0:
print(df.loc[0])
calories 420
duration 50
Name: 0, dtype: int64
print(df.loc[-1])
----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
File ~\anaconda3\lib\site-packages\pandas\core\indexes\range.py:385,
in RangeIndex.get_loc(self, key, method, tolerance)
384 try:
--> 385 return self._range.index(new_key)
386 except ValueError as err:
The above exception was the direct cause of the following exception:
File ~\anaconda3\lib\site-packages\pandas\core\indexing.py:967, in
_LocationIndexer.__getitem__(self, key)
964 axis = self.axis or 0
966 maybe_callable = com.apply_if_callable(key, self.obj)
--> 967 return self._getitem_axis(maybe_callable, axis=axis)
File ~\anaconda3\lib\site-packages\pandas\core\indexing.py:1202, in
_LocIndexer._getitem_axis(self, key, axis)
1200 # fall thru to straight lookup
1201 self._validate_key(key, axis)
-> 1202 return self._get_label(key, axis=axis)
File ~\anaconda3\lib\site-packages\pandas\core\indexing.py:1153, in
_LocIndexer._get_label(self, label, axis)
1151 def _get_label(self, label, axis: int):
1152 # GH#5667 this will fail if the label is not present in
the axis.
-> 1153 return self.obj.xs(label, axis=axis)
File ~\anaconda3\lib\site-packages\pandas\core\generic.py:3864, in
NDFrame.xs(self, key, axis, level, drop_level)
3862 new_index = index[loc]
3863 else:
-> 3864 loc = index.get_loc(key)
3866 if isinstance(loc, np.ndarray):
3867 if loc.dtype == np.bool_:
File ~\anaconda3\lib\site-packages\pandas\core\indexes\range.py:387,
in RangeIndex.get_loc(self, key, method, tolerance)
385 return self._range.index(new_key)
386 except ValueError as err:
--> 387 raise KeyError(key) from err
388 self._check_indexing_error(key)
389 raise KeyError(key)
KeyError: -1
print(df.loc[2])
calories 390
duration 45
Name: 2, dtype: int64
df
calories duration
0 420 50
1 380 40
2 390 45
Named Indexes
With the index argument, you can name your own indexes.
print(df)
df
calories duration
day1 420 50
day2 380 40
day3 390 45
calories duration
day1 420 50
day2 380 40
day3 390 45
Use the named index in the loc attribute to return the specified row(s).
# Return "day2":
print(df.loc["day2"])
calories 380
duration 40
Name: day2, dtype: int64
# Return "calories":
print(df.loc["calories"])
----------------------------------------------------------------------
-----
KeyError Traceback (most recent call
last)
File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3621,
in Index.get_loc(self, key, method, tolerance)
3620 try:
-> 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
File ~\anaconda3\lib\site-packages\pandas\_libs\index.pyx:136, in
pandas._libs.index.IndexEngine.get_loc()
File ~\anaconda3\lib\site-packages\pandas\_libs\index.pyx:163, in
pandas._libs.index.IndexEngine.get_loc()
File pandas\_libs\hashtable_class_helper.pxi:5198, in
pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas\_libs\hashtable_class_helper.pxi:5206, in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'calories'
The above exception was the direct cause of the following exception:
File ~\anaconda3\lib\site-packages\pandas\core\indexing.py:967, in
_LocationIndexer.__getitem__(self, key)
964 axis = self.axis or 0
966 maybe_callable = com.apply_if_callable(key, self.obj)
--> 967 return self._getitem_axis(maybe_callable, axis=axis)
File ~\anaconda3\lib\site-packages\pandas\core\indexing.py:1202, in
_LocIndexer._getitem_axis(self, key, axis)
1200 # fall thru to straight lookup
1201 self._validate_key(key, axis)
-> 1202 return self._get_label(key, axis=axis)
File ~\anaconda3\lib\site-packages\pandas\core\indexing.py:1153, in
_LocIndexer._get_label(self, label, axis)
1151 def _get_label(self, label, axis: int):
1152 # GH#5667 this will fail if the label is not present in
the axis.
-> 1153 return self.obj.xs(label, axis=axis)
File ~\anaconda3\lib\site-packages\pandas\core\generic.py:3864, in
NDFrame.xs(self, key, axis, level, drop_level)
3862 new_index = index[loc]
3863 else:
-> 3864 loc = index.get_loc(key)
3866 if isinstance(loc, np.ndarray):
3867 if loc.dtype == np.bool_:
File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3623,
in Index.get_loc(self, key, method, tolerance)
3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
-> 3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will
raise
3626 # InvalidIndexError. Otherwise we fall through and re-
raise
3627 # the TypeError.
3628 self._check_indexing_error(key)
KeyError: 'calories'
If your data sets are stored in a file, Pandas can load them into a DataFrame.
A simple way to store big data sets is to use CSV files (comma separated files).
CSV files contains plain text and is a well known format that can be read by everyone including
Pandas.
df = pd.read_csv('data.csv')
print(df.to_string())
If you have a large DataFrame with many rows, Pandas will only return the first 5 rows, and the
last 5 rows:
df = pd.read_csv('data.csv')
print(df)
To find out what parameters a method takes and what data format it supports, we can type
method name followed by a question mark (?)
import pandas as pd
pd.read_csv? # Run yourself
To find the type of dataframe
type(df)
pandas.core.frame.DataFrame
max_rows
You can check your system's maximum rows with the pd.options.display.max_rows statement.
60
df = pd.read_csv('data.csv')
print(df)
df = pd.DataFrame(data)
print(df)
data = {
"Duration":[60,60,60,45,45,60],
"Pulse":[110,117,103,109,117,102],
"Maxpulse":[130,145,135,175,148,127],
"Calories":[409,479,340,282,406,300]}
# df = pd.DataFrame(data, index = [0,1,2,3,4,5])
df = pd.DataFrame(data)
print(df)
One of the most used method for getting a quick overview of the DataFrame, is the head()
method.
The head() method returns the headers and a specified number of rows, starting from the top.
df = pd.read_csv('data.csv')
print(df.head(10))
df = pd.read_csv('data.csv')
print(df.head())
There is also a tail() method for viewing the last rows of the DataFrame.
The tail() method returns the headers and a specified number of rows, starting from the bottom.
# Print the last 5 rows of the DataFrame:
print(df.tail())
print(df.tail(10))
type(df)
pandas.core.frame.DataFrame
df.dtypes
Duration int64
Pulse int64
Maxpulse int64
Calories float64
dtype: object
df.shape
(169, 4)
df.columns
list(df.columns)
df.head(5)
df.head(5).transpose()
0 1 2 3 4
Duration 60.0 60.0 60.0 45.0 45.0
Pulse 110.0 117.0 103.0 109.0 117.0
Maxpulse 130.0 145.0 135.0 175.0 148.0
Calories 409.1 479.0 340.0 282.4 406.0
Slicing
df
df[0:5]
df["Calories"][0:5]
0 409.1
1 479.0
2 340.0
3 282.4
4 406.0
Name: Calories, dtype: float64
df[["Pulse","Calories"]][0:5]
Pulse Calories
0 110 409.1
1 117 479.0
2 103 340.0
3 109 282.4
4 117 406.0
df
df.iloc[1:5,1:3]
Pulse Maxpulse
1 117 145
2 103 135
3 109 175
4 117 148
df.iloc[:,1:5]
df.iloc[1:5]
Duration Pulse Maxpulse Calories
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
The DataFrames object has a method called info(), that gives you more information about the
data set.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 169 entries, 0 to 168
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Duration 169 non-null int64
1 Pulse 169 non-null int64
2 Maxpulse 169 non-null int64
3 Calories 164 non-null float64
dtypes: float64(1), int64(3)
memory usage: 5.4 KB
None
Null Values
The info() method also tells us how many Non-Null values there are present in each column, and
in our data set it seems like there are 164 of 169 Non-Null values in the "Calories" column.
Which means that there are 5 rows with no value at all, in the "Calories" column, for whatever
reason.
Empty values, or Null values, can be bad when analyzing data, and you should consider
removing rows with empty values. This is a step towards what is called cleaning data, and you
will learn more about that in the next chapters.
Empty cells
Wrong data
Duplicates
Empty cells can potentially give you a wrong result when you analyze data.
First of all, we need to check if there are any missing data and NaN values in each column.
# Check if there are any missing data and NaN values in each column.
import pandas as pd
df = pd.read_csv('data1.csv')
df.isnull().values.any()
df['Calories'].isnull().values.any()
True
df.isnull()
df['Maxpulse'].isnull().values.any()
False
df['Pulse'].isnull().values.any()
False
df['Date'].isnull().values.any()
True
df.describe()
Remove Rows
One way to deal with empty cells is to remove rows that contain empty cells.
This is usually OK, since data sets can be very big, and removing a few rows will not have a big
impact on the result.
df = pd.read_csv('data1.csv')
new_df = df.dropna()
print(new_df.to_string())
Note: By default, the dropna() method returns a new DataFrame, and will not change the
original.
print(new_df.info())
<class 'pandas.core.frame.DataFrame'>
Int64Index: 29 entries, 0 to 31
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Duration 29 non-null int64
1 Date 29 non-null object
2 Pulse 29 non-null int64
3 Maxpulse 29 non-null int64
4 Calories 29 non-null float64
dtypes: float64(1), int64(3), object(1)
memory usage: 1.4+ KB
None
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Duration 32 non-null int64
1 Date 31 non-null object
2 Pulse 32 non-null int64
3 Maxpulse 32 non-null int64
4 Calories 30 non-null float64
dtypes: float64(1), int64(3), object(1)
memory usage: 1.4+ KB
None
If you want to change the original DataFrame, use the inplace = True argument:
df = pd.read_csv('data1.csv')
df.dropna(inplace = True)
print(df.to_string())
print(df.info())
<class 'pandas.core.frame.DataFrame'>
Int64Index: 29 entries, 0 to 31
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Duration 29 non-null int64
1 Date 29 non-null object
2 Pulse 29 non-null int64
3 Maxpulse 29 non-null int64
4 Calories 29 non-null float64
dtypes: float64(1), int64(3), object(1)
memory usage: 1.4+ KB
None
Note: Now, the dropna(inplace = True) will NOT return a new DataFrame, but it will remove all
rows containing NULL values from the original DataFrame.
Another way of dealing with empty cells is to insert a new value instead.
This way you do not have to delete entire rows just because of some empty cells.
df = pd.read_csv('data1.csv')
print(df.to_string())
The example above replaces all empty cells in the whole Data Frame.
To only replace empty values for one column, specify the column name for the DataFrame:
import pandas as pd
df = pd.read_csv('data1.csv')
print(df.to_string())
# Replace NULL values in the "Calories" columns with the number 130:
df["Calories"].fillna(130, inplace = True)
print(df.to_string())
A common way to replace empty cells, is to calculate the mean, median or mode value of the
column.
Pandas uses the mean(), median() and mode() methods to calculate the respective values for a
specified column:
# Calculate the MEAN, and replace any empty values with it:
import pandas as pd
df = pd.read_csv('data1.csv')
x = df["Calories"].mean()
304.68
print(df.to_string())
Mean = the average value (the sum of all values divided by number of values).
# Calculate the MEDIAN, and replace any empty values with it:
import pandas as pd
df = pd.read_csv('data1.csv')
x = df["Calories"].median()
291.2
print(df.to_string())
Median = the value in the middle, after you have sorted all values ascending.
# Calculate the MODE, and replace any empty values with it:
import pandas as pd
df = pd.read_csv('data1.csv')
x = df["Calories"].mode()[0]
300.0
print(df.to_string())
Cells with data of wrong format can make it difficult, or even impossible, to analyze data.
To fix it, you have two options: remove the rows, or convert all cells in the columns into the same
format.
print(df.to_string())
In our Data Frame, we have two cells with the wrong format. Check out row 22 and 26, the 'Date'
column should be a string that represents a date:
# Convert to date:
import pandas as pd
df = pd.read_csv('data1.csv')
df['Date'] = pd.to_datetime(df['Date'])
print(df.to_string())
C:\Users\gargs\AppData\Local\Temp\ipykernel_6776\951940559.py:6:
UserWarning: Parsing '26-12-2020' in DD/MM/YYYY format. Provide format
or specify infer_datetime_format=True for consistent parsing.
df['Date'] = pd.to_datetime(df['Date'])
As you can see from the result, the date in row 26 was fixed, but the empty date in row 22 got a
NaT (Not a Time) value, in other words an empty value. One way to deal with empty values is
simply removing the entire row.
Removing Rows
The result from the converting in the example above gave us a NaT value, which can be handled
as a NULL value, and we can remove the row by using the dropna() method.
"Wrong data" does not have to be "empty cells" or "wrong format", it can just be wrong, like if
someone registered "199" instead of "1.99".
Sometimes you can spot wrong data by looking at the data set, because you have an expectation
of what it should be.
If you take a look at our data set, you can see that in row 7, the duration is 450, but for all the
other rows the duration is between 30 and 60.
Replacing Values
One way to fix wrong values is to replace them with something else.
In our example, it is most likely a typo, and the value should be "45" instead of "450", and we
could just insert "45" in row 7:
For small data sets you might be able to replace the wrong data one by one, but not for big data
sets.
To replace wrong data for larger data sets you can create some rules, e.g. set some boundaries
for legal values, and replace any values that are outside of the boundaries.
import pandas as pd
df = pd.read_csv('data1.csv')
for x in df.index:
if df.loc[x, "Duration"] > 120:
df.loc[x, "Duration"] = 120
df
Removing Rows
Another way of handling wrong data is to remove the rows that contains wrong data.
This way you do not have to find out what to replace them with, and there is a good chance you
do not need them to do your analyses.
df = pd.read_csv('data1.csv')
for x in df.index:
if df.loc[x, "Duration"] > 120:
df.drop(x, inplace = True)
df
import pandas as pd
df = pd.read_csv('data1.csv')
df
Duplicate rows are rows that have been registered more than one time.
By taking a look at our test data set, we can assume that row 11 and 12 are duplicates.
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 True
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
30 False
31 False
dtype: bool
Removing Duplicates
df
Note: The (inplace = True) will make sure that the method does NOT return a new DataFrame,
but it will remove all duplicates from the original DataFrame.
The corr() method calculates the relationship between each column in your data set.
import pandas as pd
df = pd.read_csv('data1.csv')
df
df.corr()
1 means that there is a 1 to 1 relationship (a perfect correlation), and for this data set, each time a
value went up in the first column, the other one went up as well.
0.9 is also a good relationship, and if you increase one value, the other will probably increase as
well.
-0.9 would be just as good relationship as 0.9, but if you increase one value, the other will
probably go down.
0.2 means NOT a good relationship, meaning that if one value goes up does not mean that the
other will.
What is a good correlation? It depends on the use, but I think it is safe to say you have to have at
least 0.6 (or -0.6) to call it a good correlation.
Perfect Correlation:
We can see that "Duration" and "Duration" got the number 1.000000,
which makes sense, each column always has a perfect relationship with
itself.
Good Correlation:
import pandas as pd
df = pd.read_csv('data1.csv')
df
Obtain unique values in each column by using the unique function and count their occurrence in
a column using nunique function.
print(df['Duration'].unique())
print(df['Duration'].nunique())
[ 60 45 450 30]
4
print(df['Pulse'].unique())
print(df['Pulse'].nunique())
[110 117 103 109 102 104 98 100 106 90 97 108 130 105 92]
15
import pandas as pd
df = pd.read_csv('data1.csv')
df
23 300.0
Name: Calories, dtype: float64
27 241.0
31 243.0
Name: Calories, dtype: float64
df1[df1['Duration'] == 60]["Pulse"].mean()
103.375
Groupby
Pandas dataframe.groupby() function is used to split the data into groups based on some
criteria.
df
df.groupby('Pulse').describe()
Duration
\
count mean std min 25% 50% 75% max
Pulse
df.groupby('Pulse').describe().reset_index()
Pulse Duration
\
count mean std min 25% 50% 75%
max
0 90 1.0 45.0 NaN 45.0 45.00 45.0 45.00
45.0
1 92 2.0 60.0 0.000000 60.0 60.00 60.0 60.00
60.0
2 97 1.0 45.0 NaN 45.0 45.00 45.0 45.00
45.0
3 98 3.0 60.0 0.000000 60.0 60.00 60.0 60.00
60.0
4 100 6.0 57.5 6.123724 45.0 60.00 60.0 60.00
60.0
5 102 3.0 60.0 0.000000 60.0 60.00 60.0 60.00
60.0
6 103 4.0 60.0 0.000000 60.0 60.00 60.0 60.00
60.0
7 104 2.0 255.0 275.771645 60.0 157.50 255.0 352.50
450.0
8 105 1.0 45.0 NaN 45.0 45.00 45.0 45.00
45.0
9 106 1.0 60.0 NaN 60.0 60.00 60.0 60.00
60.0
10 108 1.0 60.0 NaN 60.0 60.00 60.0 60.00
60.0
11 109 2.0 37.5 10.606602 30.0 33.75 37.5 41.25
45.0
12 110 2.0 60.0 0.000000 60.0 60.00 60.0 60.00
60.0
13 117 2.0 52.5 10.606602 45.0 48.75 52.5 56.25
60.0
14 130 1.0 60.0 NaN 60.0 60.00 60.0 60.00
60.0
df.describe()
df.groupby('Pulse')
<pandas.core.groupby.generic.DataFrameGroupBy object at
0x00000220A4F94BB0>
df = pd.read_csv('data1.csv')
df.groupby('Pulse').mean()
df.groupby(['Pulse','Duration']).mean().reset_index()
Value Price
Category
A 55 550
B 45 450
Category Value Price
0 A 55 550
1 B 45 450
reset_index
Pandas reset_index() is a method to reset index of a Data Frame. reset_index() method sets a list
of integer ranging from 0 to length of data as index.
df
df.reset_index()
Value Counts
df
df = pd.read_csv('data1.csv')
df['Duration'].value_counts()
60 24
45 6
450 1
30 1
Name: Duration, dtype: int64
df.Duration.value_counts()
60 24
45 6
450 1
30 1
Name: Duration, dtype: int64
df.Calories.value_counts()
300.0 3
243.0 2
250.7 2
409.1 1
275.0 1
280.0 1
241.0 1
250.0 1
334.5 1
246.0 1
282.0 1
364.2 1
323.0 1
215.2 1
379.3 1
479.0 1
345.3 1
329.3 1
269.0 1
195.1 1
253.3 1
374.0 1
406.0 1
282.4 1
340.0 1
380.3 1
Name: Calories, dtype: int64
Cross Tabulations
Cross-tabulation features will help find occurrences for the combination of values for two
columns.
df
pd.crosstab(df['Duration'],df['Pulse'])
30 0 0 0 0 0 0 0 0 0 0 0 1
0
45 1 0 1 0 1 0 0 0 1 0 0 1
0
60 0 2 0 3 5 3 4 1 0 1 1 0
2
450 0 0 0 0 0 0 0 1 0 0 0 0
0
Sorting Dataframe
df
df[['Duration','Pulse']]
Duration Pulse
0 60 110
1 60 117
2 60 103
3 45 109
4 45 117
5 60 102
6 60 110
7 450 104
8 30 109
9 60 98
10 60 103
11 60 100
12 60 100
13 60 106
14 60 104
15 60 98
16 60 98
17 60 100
18 45 90
19 60 103
20 45 97
21 60 108
22 45 100
23 60 130
24 45 105
25 60 102
26 60 100
27 60 92
28 60 103
29 60 100
30 60 102
31 60 92
Duration Pulse
18 45 90
31 60 92
27 60 92
20 45 97
16 60 98
9 60 98
15 60 98
29 60 100
11 60 100
12 60 100
Matplotlib
Matplotlib is a low level graph plotting library in python that serves as a visualization utility.
Matplotlib was created by John D. Hunter.
Matplotlib is mostly written in python, a few segments are written in C, Objective-C and
Javascript for Platform compatibility.
Installation of Matplotlib
Import Matplotlib
import matplotlib
print(matplotlib.__version__)
3.5.1
If we need to plot a line from (1, 3) to (8, 10), we have to pass two arrays [1, 8] and [3, 10] to the
plot function.
plt.plot(xpoints, ypoints)
plt.show()
To plot only the markers, you can use shortcut string notation parameter 'o', which means
'rings'.
# Draw two points in the diagram, one at position (1, 3) and one in
position (8, 10):
import matplotlib.pyplot as plt
import numpy as np
Multiple Points
You can plot as many points as you like, just make sure you have the same number of points in
both axis.
Draw a line in a diagram from position (1, 3) to (2, 8) then to (6, 1) and finally to position (8, 10):
plt.plot(xpoints, ypoints)
plt.show()
Default X-Points
If we do not specify the points on the x-axis, they will get the default values 0, 1, 2, 3 etc.,
depending on the length of the y-points.
So, if we take the same example as above, and leave out the x-points, the diagram will look like
this:
plt.plot(ypoints)
plt.show()
Markers
You can use the keyword argument marker to emphasize each point with a specified marker:
# Draw two points in the diagram, one at position (1, 3) and one in
position (8, 10):
import matplotlib.pyplot as plt
import numpy as np
Marker Description
'o' Circle
'*' Star
'.' Point
',' Pixel
'x' X
'X' X (filled)
'+' Plus
's' Square
'D' Diamond
'd' Diamond (thin)
'p' Pentagon
'H' Hexagon
'h' Hexagon
'^' Triangle Up
'2' Tri Up
'|' Vline
'_' Hline
You can also use the shortcut string notation parameter to specify the marker.
This parameter is also called fmt, and is written with this syntax:
marker|line|color
plt.plot(ypoints, 'o--r')
plt.show()
Line Reference
Color Reference
'r' Red
'g' Green
'b' Blue
'c' Cyan
'm' Magenta
'y' Yellow
'k' Black
'w' White
Marker Size
You can use the keyword argument markersize or the shorter version, ms to set the size of the
markers:
# Set the size of the markers to 20:
import matplotlib.pyplot as plt
import numpy as np
Marker Color
You can use the keyword argument markeredgecolor or the shorter mec to set the color of the
edge of the markers:
# Set the color of both the edge and the face to red:
import matplotlib.pyplot as plt
import numpy as np
https://www.w3schools.com/colors/colors_hexadecimal.asp
https://www.w3schools.com/colors/colors_names.asp
Linestyle
You can use the keyword argument linestyle, or shorter ls, to change the style of the plotted line:
Line Styles
Style Or
'solid' (default) '-'
'dotted' ':'
'dashed' '--'
'dashdot' '-.'
'None' '' or ' '
Line Color
You can use the keyword argument color or the shorter c to set the color of the line:
Line Width
You can use the keyword argument linewidth or the shorter lw to change the width of the line.
Multiple Lines
You can plot as many lines as you like by simply adding more plt.plot() functions:
y1 = np.array([3, 8, 1, 10])
y2 = np.array([6, 2, 7, 11])
plt.plot(y1)
plt.plot(y2)
plt.show()
You can also plot many lines by adding the points for the x- and y-axis for each line in the same
plt.plot() function.
(In the examples above we only specified the points on the y-axis, meaning that the points on
the x-axis got the the default values (0, 1, 2, 3).)
# Draw two lines by specifiyng the x- and y-point values for both
lines:
import matplotlib.pyplot as plt
import numpy as np
x1 = np.array([0, 1, 2, 3])
y1 = np.array([3, 8, 1, 10])
x2 = np.array([0, 1, 2, 3])
y2 = np.array([6, 2, 7, 11])
With Pyplot, you can use the xlabel() and ylabel() functions to set a label for the x- and y-axis.
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")
plt.show()
Create a Title for a Plot
With Pyplot, you can use the title() function to set a title for the plot.
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.show()
Set Font Properties for Title and Labels
You can use the fontdict parameter in xlabel(), ylabel(), and title() to set font properties for the
title and labels.
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
font1 = {'family':'serif','color':'blue','size':20}
font2 = {'family':'serif','color':'darkred','size':15}
plt.plot(x, y)
plt.show()
Position the Title
You can use the loc parameter in title() to position the title.
Legal values are: 'left', 'right', and 'center'. Default value is 'center'.
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.show()
Add Grid Lines to a Plot
With Pyplot, you can use the grid() function to add grid lines to the plot.
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.grid()
plt.show()
Specify Which Grid Lines to Display
You can use the axis parameter in the grid() function to specify which grid lines to display.
Legal values are: 'x', 'y', and 'both'. Default value is 'both'.
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.grid(axis = 'x')
plt.show()
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.grid(axis = 'y')
plt.show()
Set Line Properties for the Grid
You can also set the line properties of the grid, like this: grid(color = 'color', linestyle = 'linestyle',
linewidth = number).
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.show()
Display Multiple Plots
With the subplot() function you can draw multiple plots in one figure:
# Draw 2 plots:
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(1, 2, 1)
plt.plot(x,y)
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.show()
# Draw 2 plots:
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(2, 1, 1)
plt.plot(x,y)
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(2, 1, 2)
plt.plot(x,y)
plt.show()
# Draw 6 plots:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(2, 3, 1)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(2, 3, 2)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(2, 3, 3)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(2, 3, 4)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(2, 3, 5)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(2, 3, 6)
plt.plot(x,y)
plt.show()
Title
You can add a title to each plot with the title() function:
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(1, 2, 1)
plt.plot(x,y)
plt.title("SALES")
plt.xlabel('x1')
plt.ylabel('y')
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.title("INCOME")
plt.xlabel('x2')
plt.show()
Super Title
You can add a title to the entire figure with the suptitle() function:
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(1, 2, 1)
plt.plot(x,y)
plt.title("SALES")
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.title("INCOME")
plt.suptitle("MY SHOP")
plt.show()
With Pyplot, you can use the scatter() function to draw a scatter plot.
The scatter() function plots one dot for each observation. It needs two arrays of the same length,
one for the values of the x-axis, and one for values on the y-axis:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)
plt.xlabel("Car Age")
plt.ylabel("Speed of Car")
plt.show()
The observation in the example above is the result of 13 cars passing by.
It seems that the newer the car, the faster it drives, but that could be a coincidence, after all we
only registered 13 cars.
Compare Plots
In the example above, there seems to be a relationship between speed and age, but what if we
plot the observations from another day as well? Will the scatter plot tell us something else?
You can even set a specific color for each dot by using an array of colors as value for the c
argument:
Note: You cannot use the color argument for this, only the c argument.
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors =
np.array(["red","green","blue","yellow","pink","black","orange","purpl
e","beige","brown","gray","cyan","magenta"])
plt.scatter(x, y, c=colors)
plt.xlabel("Car Age")
plt.ylabel("Speed of Car")
plt.show()
ColorMap
A colormap is like a list of colors, where each color has a value that ranges from 0 to 100.
https://www.w3schools.com/python/matplotlib_scatter.asp
This colormap is called 'viridis' and as you can see it ranges from 0, which is a purple color, up to
100, which is a yellow color.
You can specify the colormap with the keyword argument cmap with the value of the colormap,
in this case 'viridis' which is one of the built-in colormaps available in Matplotlib.
In addition you have to create an array with values (from 0 to 100), one value for each point in
the scatter plot:
You can include the colormap in the drawing by including the plt.colorbar() statement:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors = np.array([0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90,
100])
plt.colorbar()
plt.show()
Available ColorMaps
Name Reverse
Accent Accent_r
Blues Blues_r
BrBG BrBG_r
BuGn BuGn_r
BuPu BuPu_r
CMRmap CMRmap_r
Dark2 Dark2_r
GnBu GnBu_r
Greens Greens_r
Greys Greys_r
OrRd OrRd_r
Oranges Oranges_r
PRGn PRGn_r
Paired Paired_r
Pastel1 Pastel1_r
Pastel2 Pastel2_r
PiYG PiYG_r
PuBu PuBu_r
PuBuGn PuBuGn_r
PuOr PuOr_r
PuRd PuRd_r
Purples Purples_r
RdBu RdBu_r
RdGy RdGy_r
RdPu RdPu_r
RdYlBu RdYlBu_r
RdYlGn RdYlGn_r
Reds Reds_r
Set1 Set1_r
Set2 Set2_r
Set3 Set3_r
Spectral Spectral_r
Wistia Wistia_r
YlGn YlGn_r
YlGnBu YlGnBu_r
YlOrBr YlOrBr_r
YlOrRd YlOrRd_r
afmhot afmhot_r
autumn autumn_r
binary binary_r
bone bone_r
brg brg_r
bwr bwr_r
cividis cividis_r
cool cool_r
coolwarm coolwarm_r
copper copper_r
cubehelix cubehelix_r
flag flag_r
gist_earth gist_earth_r
gist_gray gist_gray_r
gist_heat gist_heat_r
gist_ncar gist_ncar_r
gist_rainbow gist_rainbow_r
gist_stern gist_stern_r
gist_yarg gist_yarg_r
gnuplot gnuplot_r
gnuplot2 gnuplot2_r
gray gray_r
hot hot_r
hsv hsv_r
inferno inferno_r
jet jet_r
magma magma_r
nipy_spectral nipy_spectral_r
ocean ocean_r
pink pink_r
plasma plasma_r
prism prism_r
rainbow rainbow_r
seismic seismic_r
spring spring_r
summer summer_r
tab10 tab10_r
tab20 tab20_r
tab20b tab20b_r
tab20c tab20c_r
terrain terrain_r
twilight twilight_r
twilight_shifted twilight_shifted_r
viridis viridis_r
winter winter_r
Size
You can change the size of the dots with the s argument.
Just like colors, make sure the array for sizes has the same length as the arrays for the x- and y-
axis:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
sizes = np.array([20,50,100,200,500,1000,60,90,10,300,600,800,75])
plt.scatter(x, y, s=sizes)
plt.show()
Alpha
You can adjust the transparency of the dots with the alpha argument.
Just like colors, make sure the array for sizes has the same length as the arrays for the x- and y-
axis:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
sizes = np.array([20,50,100,200,500,1000,60,90,10,300,600,800,75])
plt.show()
You can combine a colormap with different sizes of the dots. This is best visualized if the dots
are transparent:
# Create random arrays with 100 values for x-points, y-points, colors
and sizes:
plt.colorbar()
plt.show()
Creating Bars
With Pyplot, you can use the bar() function to draw bar graphs:
# Draw 4 bars:
plt.bar(x,y)
plt.show()
The bar() function takes arguments that describes the layout of the bars.
The categories and their values represented by the first and second argument as arrays.
x = ["APPLES", "BANANAS"]
y = [400, 350]
plt.bar(x, y)
If you want the bars to be displayed horizontally instead of vertically, use the barh() function:
plt.barh(x, y)
plt.show()
Bar Color
The bar() and barh() take the keyword argument color to set the color of the bars:
The bar() takes the keyword argument width to set the width of the bars:
Bar Height
The barh() takes the keyword argument height to set the height of the bars:
Histogram
The hist() function will use an array of numbers to create a histogram, the array is sent into the
function as an argument.
For simplicity we use NumPy to randomly generate an array with 250 values, where the values
will concentrate around 170, and the standard deviation is 10.
We specify that the mean value is 170, and the standard deviation is 10.
Meaning that the values should be concentrated around 170, and rarely further away than 10
from the mean.
And as you can see from the histogram, most values are between 160 and 180, with a top at
approximately 170.
import numpy as np
print(x)
[162.82473535 161.24148305 152.81908739 156.42140434 177.66922688
169.9879616 160.756545 177.85540601 159.98379758 177.43989869
185.99099115 170.98105068 185.32823796 166.58397616 163.03517336
180.30492701 162.04745089 177.93312155 157.30676816 171.73554506
172.670417 154.58684949 174.39945968 163.78579293 160.67909659
159.70805791 166.31807211 181.44829153 176.55106405 162.08056534
167.60361635 180.95715229 163.27958009 152.4366006 160.0574906
144.70618854 176.45877577 166.05807122 176.75142851 162.0708222
161.49867634 163.10645812 181.54541804 168.01228348 158.68856741
148.24083068 164.00330401 160.91314714 164.78247856 182.3743663
169.23842511 165.99017778 157.31897213 169.69702019 163.36983982
165.24484739 162.13890473 171.08192126 157.3047641 184.74539136
168.5122959 169.63305775 175.31982763 159.426164 174.6899734
167.60808368 160.30384157 179.67467148 175.60982726 178.19325503
187.84335758 173.87842969 167.87475811 171.00975942 189.69271088
167.6849284 166.5792361 166.95197375 172.14047764 166.31610278
174.56871043 180.25427937 143.08645722 174.23721057 179.62120818
165.24435097 194.18063389 179.34468962 176.42744483 157.94729365
168.85587512 166.75532194 170.49840416 187.0723715 163.5673682
179.08397599 167.77843564 170.5401053 173.63905859 157.96012641
183.98360431 158.77678822 159.31159852 159.49375082 160.28992186
186.0497138 178.99975871 158.87180661 162.85046005 175.98346169
165.8030326 158.30532436 166.60143589 154.41193594 177.26717414
162.13727597 171.5995603 167.33684079 174.9568394 158.47527501
167.00271717 176.08264305 184.76047612 156.57142819 182.29236386
170.82633354 163.28594098 154.96856184 178.2858139 157.80304212
172.0308982 163.51841416 158.06308417 171.39681444 154.42965029
174.8478415 192.20931673 176.10036866 156.38985278 168.14837529
163.14524581 167.13927347 144.46522745 166.43504583 155.81423396
166.68351177 189.00441432 168.81679544 168.37687773 165.26859917
169.61163803 176.28064952 169.91441305 152.31252707 177.65393137
155.20503724 156.00253148 171.59037863 163.18285108 171.53752882
176.77168094 162.52566536 173.65096248 169.75831402 171.13552086
160.96485053 170.92576163 197.13488939 178.84028126 175.25256691
174.30559673 174.55793648 171.28917593 182.75983705 163.48474339
171.79345064 181.33163353 181.75690243 170.56288915 175.54864891
171.39277086 157.10507882 162.93494288 186.22108735 180.83615147
177.45119341 176.74982051 164.74782882 164.1272427 176.95981347
180.71192368 163.51578627 161.40654864 183.31734345 165.91617314
170.21755489 188.20213837 185.16707843 184.39225039 169.9894885
161.3955984 164.32427469 166.21098961 171.65617603 163.59358437
179.0911516 180.97185572 172.98504063 176.91140312 162.73396879
157.82345464 178.57631953 172.72606044 150.15585485 178.22769764
161.50606571 169.11262122 159.70493474 183.88260754 152.4162303
177.37693533 161.91342926 176.64997265 162.56062519 159.72921782
158.53459525 173.18737201 167.7933642 173.27039627 167.68193019
157.80211657 168.66185172 175.06270681 156.67580503 175.5160176
175.14353986 166.4217466 157.7113916 184.19017845 147.43578244
175.90149325 163.53347714 153.64890281 176.42339122 171.01671226
192.43879604 176.17190372 161.49288883 158.41295009 176.66026256]
The hist() function will read the array and produce a histogram:
# A simple histogram:
plt.hist(x)
plt.show()
# A simple histogram:
data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
With Pyplot, you can use the pie() function to draw pie charts:
plt.pie(y)
plt.show()
As you can see the pie chart draws one piece (called a wedge) for each value in the array (in this
case [35, 25, 25, 15]).
By default the plotting of the first wedge starts from the x-axis and moves counterclockwise:
Labels
The labels parameter must be an array with one label for each wedge:
As mentioned the default start angle is at the x-axis, but you can change the start angle by
specifying a startangle parameter.
Maybe you want one of the wedges to stand out? The explode parameter allows you to do that.
The explode parameter, if specified, and not None, must be an array with one value for each
wedge.
Each value represents how far from the center each wedge is displayed:
# Pull the "Apples" wedge 0.2 from the center of the pie:
Add a shadow to the pie chart by setting the shadows parameter to True:
# Add a shadow:
To add a list of explanation for each wedge, use the legend() function:
# Add a legend:
To add a header to the legend, add the title parameter to the legend function.