0% found this document useful (0 votes)
5 views

BSC Python 4

Python is a versatile, high-level programming language created by Guido van Rossum in 1991, known for its simplicity and support for various programming paradigms. It offers a wide range of libraries and can be used for web development, software development, and data science. The document also covers expressions, statements, comments, data types, variables, and string manipulation in Python.

Uploaded by

dy452262
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

BSC Python 4

Python is a versatile, high-level programming language created by Guido van Rossum in 1991, known for its simplicity and support for various programming paradigms. It offers a wide range of libraries and can be used for web development, software development, and data science. The document also covers expressions, statements, comments, data types, variables, and string manipulation in Python.

Uploaded by

dy452262
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 261

Python

Python is a popular programming language. It was created by Guido van Rossum, and released
in 1991 at Netherlands.

Python is a dynamic, high-level, and interpreted programming language. It supports OOP


approach to develop applications. It is simple and easy to learn and provides lots of high-level
data structures.

Python is a general purpose programming language. Means that it can be used to write the
computer programs for various things like games, data science, websites.

Python is a high level programming language. Languages of writing computer instructions in a


way that is easily understandable and close to human language.

Python is portable. We can use python programs in the various different operating systems like
Windows, Mac OS, Linux without any charges.

Python is an interpreted language. Because it goes through an interpreter which turns code you
write into the language understood by your computer’s processor.

Python is strongly typed languages. Strongly typed languages don’t convert data from one type
to another type automatically.

Python has huge set of libraries. A python library is a collection of programs which you can
incorporate into your own program without writing code for them.

Used for

Web development

Software development

Mathematics

System scripting

Expressions
An expression is anything that has a value. Example 3, 4+5, “Hello”.

An expression can contain operators and upper hands.

Example: In 4+5, 4 and 5 are upper hands & + is an operator.

4+5

9
An expression is a combination of operators and operands that is interpreted to produce some
other value.

1) Constant Expressions: These are the expressions that have constant values only.

x = 15 + 1.3

print(x)

16.3

2) Arithmetic Expressions: An arithmetic expression is a combination of numeric values,


operators, and sometimes parenthesis. The result of this type of expression is also a numeric
value. The operators used in these expressions are arithmetic operators like addition,
subtraction, etc.

# Arithmetic Expressions
x = 40
y = 12

add = x + y
sub = x - y
pro = x * y
div = x / y

print(add)
print(sub)
print(pro)
print(div)

52
28
480
3.3333333333333335

3) Integral Expressions: These are the kind of expressions that produce only integer results after
all computations and type conversions.

# Integral Expressions
a = 13
b = 12.0

c = a + int(b)
print(c)

25

4) Floating Expressions: These are the kind of expressions which produce floating point numbers
as result after all computations and type conversions.
# Floating Expressions
a = 13
b = 5

c = a / b
print(c)

2.6

5) Relational Expressions: In these types of expressions, arithmetic expressions are written on


both sides of relational operator (> , < , >= , <=). Those arithmetic expressions are evaluated first,
and then compared as per relational operator and produce a boolean output in the end. These
expressions are also called Boolean expressions.

# Relational Expressions
a = 21
b = 13
c = 40
d = 37

p = (a + b) >= (c - d)
print(p)

True

6) Logical Expressions: These are kinds of expressions that result in either True or False. It
basically specifies one or more conditions. For example, (10 == 9) is a condition if 10 is equal to 9.
As we know it is not correct, so it will return False.

P = (10 == 9)
Q = (7 > 5)

# Logical Expressions
R = P and Q
S = P or Q
T = not P

print(R)
print(S)
print(T)

False
True
True

7) Bitwise Expressions: These are the kind of expressions in which computations are performed
at bit level.
# Bitwise Expressions
a = 12

x = a >> 2
y = a << 1

print(x, y)

3 24

8) Combinational Expressions: We can also use different types of expressions in a single


expression, and that will be termed as combinational expressions.

# Combinational Expressions
a = 16
b = 12

c = a + (b >> 1)
print(c)

22

Statements
A statement is an instruction to the computer to perform a task or a series of Python statements.

Statements can contain one or more expressions and can be single or multiple lines.

Comments
Comments can be used to explain Python code.

Comments can be used to prevent code execution.

2 types of comments

Single line (#)

Multi line (“”” “””)

# Single line strings


Str1 = "sdfgrgerf \"rere\" "
print(Str1)

sdfgrgerf "rere"
# Multiple line strings
Str1 = """Hello World,
This is an era of Computer's life"""
print(Str1)

Hello World,
This is an era of Computer's life

Data Types
1) Numbers

Integers (int)

Floating points (float)

Complex numbers (complex)

2) String

Text and Sequence of characters (str)

Strings have to enclosed between double or single quotes

3) Booleans

True or False (bool)

Getting the Data Type


type(object)

Object can be any variable, data structures, arrays, etc.

Examples (Try)

# Getting the datatype


type(2.3)
type(2)
type('Ram')

str
Verifying object data type
Syntax: type(object) is datatype

Assigning values to multiple variables, Getting the datatype, Verifying object data type, Convert
the datatype of an object to another

# Verifying object data type


type(2.3) is int
type('R') is str

True

Casting Data Types


Casting is process of converting from one data type to another.

int()

float()

str()

Examples (Try)

# Convert the datatype of an object to another


a=2.2
type(a)
b=int(a)
type(b)

int

a="1"
type(a)
b=int(a)
type(b)

int

a="R"
type(a)
b=int(a)
type(b)

----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
Input In [16], in <cell line: 3>()
1 a="R"
2 type(a)
----> 3 b=int(a)
4 type(b)

ValueError: invalid literal for int() with base 10: 'R'

Variables
Variables are named containers for storing data values.

Variables in Python are created when you assign a value to a container.

Variable names start with a letter or underscore character.

Variables name can only contain alpha-numeric characters (A-Z,0-9,…).

Variables can not start with a number.

Variable names are case sensitive. Example: Name and name.

You can not used reserved keywords in Python as variable names.

Removing or deleting the variables


a=2
b=3
c=a+b
print(a,b,c)

del a
del b,c

print(a)

----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [18], in <cell line: 1>()
----> 1 print(a)

NameError: name 'a' is not defined


Reserved words
Reserved words (also called keywords) are defined with predefined meaning and syntax in the
language. These keywords have to be used to develop programming instructions. Reserved
words can’t be used as identifiers for other programming elements like name of variable,
function, etc.

Following is the list of reserved keywords in Python 3

and, as, assert, break, class, continue, def, del, elif, else, except, finally, false, for, from, global, if,
import, in, is, lambda, nonlocal, None, not, or, pass, raise, return, True, try, with, while, yield.

# Assigning values to multiple variables


x,y,z=1,2,3
print(x+y*z)

Sequence data types


1) Sequence allow you to store values in an oriented and efficient fashion

2) There are several sequence types: strings, Unicode strings, lists, tuples, arrays and range
objects

3) Dictionaries and sets are containers for non-sequential and unordered data.

(a) Strings - Sequence of characters - "" or ''

(b) Tuples - Sequence of compound data - ()

(c) Lists - Sequence of multi-data type objects - []

(d) Arrays - Sequence of constrained list of objects (all objects are


of same datatypes) using array module from array package

(e) Dictionary - Sequence of key-value pairs - {}

(f) Sets - Sequence of unordered collection of unique data

(g) Range - Used for looping - using built-in range()

Strings
Working with Strings
spam='That is Alice's cat'
spam

Input In [20]
spam='That is Alice's cat'
^
SyntaxError: invalid syntax

spam="That is Alice's cat"


spam

"That is Alice's cat"

spam='That is Alice\'s cat'


spam

"That is Alice's cat"

Escape Characters

\' Single quote

\" Double quote

\t Tab

\n Newline (line break)

\ Backslash

print("Hello there!\nHow are you?\nI \'m doing fine.\tThank you.\nWhat


are you \\doing?")

Hello there!
How are you?
I 'm doing fine. Thank you.
What are you \doing?

print(r'That is Carol\'s cat.') # r is used for raw strings

That is Carol\'s cat.

print('That is Carol\'s cat.')

That is Carol's cat.

# Multi-line strings with Triple Quotes


print('''Dear Alice\nEve's cat has been arrested for catnapping, cat
burglary and extortion.

Sincerely,
Bob
''')

Dear Alice
Eve's cat has been arrested for catnapping, cat burglary and
extortion.

Sincerely,
Bob

print('''Dear Alice\nEve's cat has been arrested for catnapping, cat


burglary and extortion.\n\nSincerely,\nBob''')

Dear Alice
Eve's cat has been arrested for catnapping, cat burglary and
extortion.

Sincerely,
Bob

Indexing

strSample = 'learning' # string

strSample.index('l') # to find the index of substring


'l' from the string 'learning'

strSample.index('ning') # to find the index of substring


'ning' from the string 'learning'

strSample[7] # to find the substring


corresponds to 8th position

'g'

strSample[-2] # to find the substring


corresponds to 2nd last position

'n'

strSample[-9] # IndexError: string index out of


range

----------------------------------------------------------------------
-----
IndexError Traceback (most recent call
last)
Input In [33], in <cell line: 1>()
----> 1 strSample[-9]

IndexError: string index out of range

strSample[2]

'a'

strSample = 'learning' # string


print(strSample)

learning

strSample[:] # learning

'learning'

strSample[[:]]

Input In [37]
strSample[[:]]
^
SyntaxError: invalid syntax

strSample[1:]

'earning'

strSample[1::2]

'erig'

strSample[::-1]

'gninrael'

strSample[1::]

'earning'

strSample[1:-1:2]

'eri'

strSample[1:-1]

'earnin'

Adding two strings


name='A1'
age=4000
'Hello, my name is '+name+'. I am '+str(age)+' years old.'

'Hello, my name is A1. I am 4000 years old.'

name='A1'
age=4000
'Hello, my name is %s. I am %s years old.' %(name,age)

'Hello, my name is A1. I am 4000 years old.'

name='A1'
age=4000
f'Hello, my name is {name}. Next year, I will be {age+1} years old.'

'Hello, my name is A1. Next year, I will be 4001 years old.'

'Hello, my name is {name}. Next year, I will be {age+1} years old.'

'Hello, my name is {name}. Next year, I will be {age+1} years old.'

print(strSample+" ",'python') # learning python


print(strSample)

learning python
learning

print(strSample+" "+'python') # learning python


print(strSample)

learning python
learning

newString = strSample+" ",'python'


newString
# print(newString)

('learning ', 'python')

newString = strSample+" "+'python'


newString
# print(newString)

'learning python'

# Add two strings


str1 = "How old"
str2 = " are you?"
print(str1+str2)

How old are you?


# Can't add string and integer
str1 = "How old"
str2 = 45
print(str1+str2)

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [53], in <cell line: 4>()
2 str1 = "How old"
3 str2 = 45
----> 4 print(str1+str2)

TypeError: can only concatenate str (not "int") to str

# But we can add string and integer after converting the integer into
string
str1 = "How old "
str2 = 45
str3 = str(45)
print(str1+str3)

How old 45

str1 = "How are you "


str2 = "Jose"
print(str1+str2)

How are you Jose

str1 = "How are you "


str2 = "Jose"
print(f"How are you {str2}")
str3 = "Bob"
print(f"How are you {str3}")

How are you Jose


How are you Bob

str1 = "How are you {}"


name = "Jose"
print(str1.format(name))
name = "Bob"
print(str1.format(name))

How are you Jose


How are you Bob

name = "Jose"
str1 = "How are you {name}"
print(str1.format(name="Jose"))
str1 = "How are you {name}"
print(str1.format(name="Bob"))

How are you Jose


How are you Bob

name = "Jose"
str1 = "How are you {name}"
print(str1.format(name=name))

How are you Jose

frd_name = "Jose"
str1 = "How are you {name}"
print(str1.format(name=frd_name))

How are you Jose

description = "{} is {} years old."


print(description.format("Bob", 30))

Bob is 30 years old.

description = "{} is {age} years old."


print(description.format("Bob", age=30))

Bob is 30 years old.

name1 = 'GITAA'
name2 = 'Pvt'
name3 = 'Ltd'

name = '{} {}. {}.'.format(name1,name2,name3)


print(name)

GITAA Pvt. Ltd.

Multiplication

strSample = 'learning' # string


print(strSample)

learning

strSample*=3 # Concatenate thrice


print(strSample)

learninglearninglearning

The in and not in Operators with Strings

'Hello' in "Hello, World"


True

'Hello' in 'Hello'

True

'HELLO' in "Hello, World"

False

'' in 'spam'

True

'cats' not in 'cats and dogs'

False

String methods

strSample = 'learning is fun !'


print(strSample)

learning is fun !

strSample.capitalize() # returns the string with its first character


capitalized and the rest lowercased

'Learning is fun !'

strSample.title() # to capitalise the first character of each


word

'Learning Is Fun !'

strSample = "Learning is Fun !"


strSample.swapcase() # to swap the case of strings
# The swapcase() method returns a string where all the upper case
letters are lower case and vice versa.

'lEARNING IS fUN !'

strSample.upper() # all characters are in upper case

'LEARNING IS FUN !'

strSample.lower() # all characters are in lower case

'learning is fun !'

strSample

'Learning is Fun !'


strSample.find('n') # to find the index of the given letter

strSample = "Learning is Fun !"

strSample.find('z')

-1

strSample.count('A') # to count total number of 'A' in the


string

strSample.count('a') # to count total number of 'a' in the


string

strSample.replace('fun','joyful') # to replace the


letters/word

'Learning is Fun !'

strSample.replace('Fun','joyful') # to replace the


letters/word

'Learning is joyful !'

Python string isalnum() function returns True if it’s made of alphanumeric characters only. A
character is alphanumeric if it’s either an alpha or a number. If the string is empty, then isalnum()
returns False.

The isalnum() method returns True if all the characters are alphanumeric, meaning alphabet
letter (a-z) and numbers (0-9).

Example of characters that are not alphanumeric: (space)!#%&? etc.

strSample.isalnum()
# Return true if all bytes in the sequence are alphabatical ASCII
characters or ASCII decimal digits, false otherwise

False

s = 'HelloWorld2019'
print(s.isalnum())

True

s = 'Hello World 2019'


print(s.isalnum()) # False because whitespace is not an alphanumeric
character.

False

s = ''
print(s.isalnum())

False

s='A.B'
print(s.isalnum())

s = '10.50'
print(s.isalnum()) # The string contains period (.) which is
not an alphanumeric character.

False
False

s = 'çåøÉ'
print(s.isalnum()) # True because all these are Alpha
characters.

True

strSample = 'learning is fun !' # String


print(strSample)

learning is fun !

spam='Hello, world'
spam.islower()

False

spam.isupper()

False

'world'.islower()

True

'HELLO'.isupper()

True

'abc123'.islower()

True

'123'.islower()
False

'123'.isupper()

False

Try for the exercise

'Hello'.upper()

'HELLO'

'Hello'.upper().lower()

'hello'

'Hello'.upper().lower().upper()

'HELLO'

'HELLO'.lower()

'hello'

'HELLO'.lower().islower()

True

isX() Methods

isalpha()

isalnum()

isdecimal()

isspace()

istitle()

'hello'.isalpha()

True

'hello123'.isalpha()

False

'hello123'.isalnum()

True

'hello'.isalnum()
True

'123'.isdecimal()

True

' '.isspace()

True

'This Is Title Case'.istitle()

True

'This Is Title Case 123'.istitle()

True

'This Is not Title Case'.istitle()

False

'This Is NOT Title Case'.istitle()

False

The startswith() and endswith()

'Hello, world'.startswith('Hello')

True

'Hello, world'.endswith('world')

True

'abc123'.startswith('abcdef')

False

'abc123'.endswith('12')

False

'Hello, world'.startswith('Hello, world')

True

'Hello, world'.endswith('Hello, world')

True

Partition
'Hello, world'.partition('w')

('Hello, ', 'w', 'orld')

'Hello, world'.partition('world')

('Hello, ', 'world', '')

'Hello, world'.partition('o')

('Hell', 'o', ', world')

Reverse

strSample.reverse()
strSample # AttributeError: 'str'
object has no attribute 'reverse'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [123], in <cell line: 1>()
----> 1 strSample.reverse()
2 strSample

AttributeError: 'str' object has no attribute 'reverse'

Length

len(strSample)

17

Clear

strSample.clear() # AttributeError: 'str'


object has no attribute 'clear'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [125], in <cell line: 1>()
----> 1 strSample.clear()

AttributeError: 'str' object has no attribute 'clear'

Append or Add
strSample.append('sas') # AttributeError: 'str'
object has no attribute 'append'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [126], in <cell line: 1>()
----> 1 strSample.append('sas')

AttributeError: 'str' object has no attribute 'append'

strSample.add('sas') # AttributeError: 'str' object


has no attribute 'add'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [127], in <cell line: 1>()
----> 1 strSample.add('sas')

AttributeError: 'str' object has no attribute 'add'

Update

strSample.update('a')

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [128], in <cell line: 1>()
----> 1 strSample.update('a')

AttributeError: 'str' object has no attribute 'update'

Insert

strSample.insert(3,5)
print(strSample) # AttributeError: 'str'
object has no attribute 'insert'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [129], in <cell line: 1>()
----> 1 strSample.insert(3,5)
2 print(strSample)
AttributeError: 'str' object has no attribute 'insert'

pop(): removes the element at the given index from the object and prints the same

default value is -1, which returns the last item

strSample = 'learning is fun !' # String


print(strSample)

learning is fun !

strSample.pop() # AttributeError: 'str'


object has no attribute 'pop'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [131], in <cell line: 1>()
----> 1 strSample.pop()

AttributeError: 'str' object has no attribute 'pop'

Remove

strSample.remove(0) # AttributeError: 'str'


object has no attribute 'remove'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [132], in <cell line: 1>()
----> 1 strSample.remove(0)

AttributeError: 'str' object has no attribute 'remove'

del

strSample = 'learning is fun !' # String


print(strSample)

learning is fun !

del strSample

print(strSample)

----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [135], in <cell line: 1>()
----> 1 print(strSample)

NameError: name 'strSample' is not defined

Insert

strSample = 'learning is fun !' # String


print(strSample)

learning is fun !

strSample.extend(('hello'))
print(strSample) # TypeError: an integer is
required (got type str)

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [137], in <cell line: 1>()
----> 1 strSample.extend(('hello'))
2 print(strSample)

AttributeError: 'str' object has no attribute 'extend'

Lists
A list is a value that contains multiple values in an ordered sequence.

[1, 2, 3]

[1, 2, 3]

['cat', 'bat', 'rat', 'elephant']

['cat', 'bat', 'rat', 'elephant']

['hello', 3.1415, True, None, 42] # list with mixed datatypes


(having numbers and strings)

['hello', 3.1415, True, None, 42]

spam = ['cat', 'bat', 'rat', 'elephant']


spam

['cat', 'bat', 'rat', 'elephant']


lstNumbers = [1,2,3,3,3,4,5] # list with numbers (having
duplicate values)
print(lstNumbers)

[1, 2, 3, 3, 3, 4, 5]

Indexing

lstSample = [1,2,"a",'sam',2] # list

lstSample.index('sam') # to find the index of


element 'sam'

lstSample[2] # to find the element


corresponds to 3rd position

'a'

lstSample[-1] # to find the last


element in the list

lstSample = [1,2,"a",'sam',2] # list


print(lstSample)
print(lstSample[:3]) # slicing starts at the
beginning index of the list
print(lstSample[2:])

[1, 2, 'a', 'sam', 2]


[1, 2, 'a']
['a', 'sam', 2]

print(lstSample[1,-1,2])

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [148], in <cell line: 1>()
----> 1 print(lstSample[1,-1,2])

TypeError: list indices must be integers or slices, not tuple

lstSample = [1,2,"a",'sam',2]
print(lstSample[1:-1:2])

[2, 'sam']

print(lstSample[1::2])
[2, 'sam']

print(lstSample[2:4])

['a', 'sam']

print(lstSample[::-1])

[2, 'sam', 'a', 2, 1]

Concatenation

lstSample = [1,2,"a",'sam',2] # list


print(lstSample)
lstSample+['py']

[1, 2, 'a', 'sam', 2]

[1, 2, 'a', 'sam', 2, 'py']

new_lstSample = lstSample + ['py']


print(new_lstSample)

[1, 2, 'a', 'sam', 2, 'py']

Multiplication

lstSample = [1,2,"a",'sam',2] # list


print(lstSample)

[1, 2, 'a', 'sam', 2]

lstSample*2

[1, 2, 'a', 'sam', 2, 1, 2, 'a', 'sam', 2]

lstSample*=2
lstSample

[1, 2, 'a', 'sam', 2, 1, 2, 'a', 'sam', 2]

lstSample[2]*=2
lstSample

[1, 2, 'aa', 'sam', 2, 1, 2, 'a', 'sam', 2]

lstSample[1]*=2
lstSample

[1, 4, 'aa', 'sam', 2, 1, 2, 'a', 'sam', 2]

lstSample[1]+=2
lstSample
[1, 6, 'aa', 'sam', 2, 1, 2, 'a', 'sam', 2]

lstSample[2]+=2
lstSample # TypeError: can only
concatenate str (not "int") to str

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [161], in <cell line: 1>()
----> 1 lstSample[2]+=2
2 lstSample

TypeError: can only concatenate str (not "int") to str

lstSample

[1, 6, 'aa', 'sam', 2, 1, 2, 'a', 'sam', 2]

lstSample[2:4]*=2
lstSample

[1, 6, 'aa', 'sam', 'aa', 'sam', 2, 1, 2, 'a', 'sam', 2]

len

lstSample = [1,2,'a','sam',2] # list

len(lstSample)

lstSample = [1,2,'a','sam',2]
lstSample

[1, 2, 'a', 'sam', 2]

lstSample['a']='b'
lstSample

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [167], in <cell line: 1>()
----> 1 lstSample['a']='b'
2 lstSample

TypeError: list indices must be integers or slices, not str


lstSample[2]='b'
lstSample

[1, 2, 'b', 'sam', 2]

reverse

lstSample.reverse() # Reverse the order of the list


print(lstSample)

[2, 'sam', 'b', 2, 1]

clear

lstSample = [1,2,'a','sam',2] # list


print(lstSample)

[1, 2, 'a', 'sam', 2]

lstSample.clear()
print(lstSample)

[]

append or add

lstSample = [1,2,'a','sam',2] # list

lstSample.append([4,5,6])
print(lstSample)

[1, 2, 'a', 'sam', 2, [4, 5, 6]]

lstSample.add([4,5,6])
print(lstSample) # AttributeError: 'list'
object has no attribute 'add'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [174], in <cell line: 1>()
----> 1 lstSample.add([4,5,6])
2 print(lstSample)

AttributeError: 'list' object has no attribute 'add'

update

lstSample = [1,2,'a','sam',2] # list


lstSample.update(['a','3'])
print(lstSample) # AttributeError: 'list' object
has no attribute 'update'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [176], in <cell line: 1>()
----> 1 lstSample.update(['a','3'])
2 print(lstSample)

AttributeError: 'list' object has no attribute 'update'

# So we need to convert list into set and then pass update function
l=[4,'3']
s = set(l)
l1 = set(lstSample)
l1.update(s)
print(l1)

{1, 2, 4, '3', 'sam', 'a'}

insert

lstSample = [1,2,'a','sam',2] # list


print(lstSample)

[1, 2, 'a', 'sam', 2]

lstSample.insert(3,4) # inserting the element 4 at


3rd position
print(lstSample) # printing list

[1, 2, 'a', 4, 'sam', 2]

pop: removes the element at the given index from the object and prints the same

default value is -1, which returns the last item

lstSample = [1,2,'a','sam',2] # list


print(lstSample)

[1, 2, 'a', 'sam', 2]

lstSample.pop()

print(lstSample)
[1, 2, 'a', 'sam']

lstSample = [1,2,'a','sam',2] # list


print(lstSample)

[1, 2, 'a', 'sam', 2]

lstSample.pop(2)

'a'

print(lstSample)

[1, 2, 'sam', 2]

The remove() method removes the first occurrence of the element with the specified value

lstSample = [1,2,'a','sam',2] # list


print(lstSample)

[1, 2, 'a', 'sam', 2]

lstSample.remove(1) # removes the element "1"


from the list
lstSample

[2, 'a', 'sam', 2]

lstSample.remove(2) # removes the element "2" from


the list
lstSample

['a', 'sam', 2]

lstSample.remove('sam') # removes the element "sam"


from the list
lstSample

['a', 2]

del: deletes the entire object of any data type

lstSample = [1,2,'a','sam',2] # list


print(lstSample)

[1, 2, 'a', 'sam', 2]

del lstSample # deleting the list, lstSample

lstSample # NameError: name 'lstSample' is not defined


----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [192], in <cell line: 1>()
----> 1 lstSample

NameError: name 'lstSample' is not defined

lstSample = [1,2,'a','sam',2] # list


print(lstSample)

[1, 2, 'a', 'sam', 2]

del lstSample[2] # deleting the 3rd item in the


list
lstSample

[1, 2, 'sam', 2]

The extend() method adds the specified list elements (or any iterable - list, set, tuple, etc.) to the
end of the current list

lstSample = [1,2,'a','sam',2] # list


print(lstSample)

[1, 2, 'a', 'sam', 2]

lstSample.extend([4,5,3,5])
print(lstSample)

[1, 2, 'a', 'sam', 2, 4, 5, 3, 5]

Try for the exercise

spam = ['cat', 'bat', 'rat', 'elephant']

spam[0]

'cat'

spam[2]

'rat'

['cat', 'bat', 'rat', 'elephant'][3]

'elephant'

'Hello ' + spam[0]

'Hello cat'
'The ' + spam[1] + ' ate the ' + spam[0] + '.'

'The bat ate the cat.'

spam = ['cat', 'bat', 'rat', 'elephant']

spam[10000]

----------------------------------------------------------------------
-----
IndexError Traceback (most recent call
last)
Input In [204], in <cell line: 1>()
----> 1 spam[10000]

IndexError: list index out of range

spam = ['cat', 'bat', 'rat', 'elephant']

spam[1]

'bat'

spam[1.0]

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [207], in <cell line: 1>()
----> 1 spam[1.0]

TypeError: list indices must be integers or slices, not float

spam[int(1.0)]

'bat'

spam = [['cat', 'bat'], [10, 20, 30, 40, 50]]

spam[0]

['cat', 'bat']

spam[0][1]

'bat'

spam[1][4]

50

spam = ['cat', 'bat', 'rat', 'elephant']


spam[-1]

'elephant'

'The ' + spam[-1] + ' is afraid of the ' + spam[-3] + '.'

'The elephant is afraid of the bat.'

spam = ['cat', 'bat', 'rat', 'elephant']

spam[0:4]

['cat', 'bat', 'rat', 'elephant']

spam[1:3]

['bat', 'rat']

spam[0:-1]

['cat', 'bat', 'rat']

spam = ['cat', 'bat', 'rat', 'elephant']

spam[:2]

['cat', 'bat']

spam[1:]

['bat', 'rat', 'elephant']

spam[:]

['cat', 'bat', 'rat', 'elephant']

spam = ['cat', 'bat', 'rat', 'elephant']


spam[1] = 'aardvark'
spam

['cat', 'aardvark', 'rat', 'elephant']

spam[2] = spam[1]
spam

['cat', 'aardvark', 'aardvark', 'elephant']

spam[-1] = 12345
spam

['cat', 'aardvark', 'aardvark', 12345]

[1, 2, 3] + ['A', 'B', 'C']

[1, 2, 3, 'A', 'B', 'C']


['X', 'Y', 'Z'] * 3

['X', 'Y', 'Z', 'X', 'Y', 'Z', 'X', 'Y', 'Z']

spam = [1, 2, 3]
spam = spam + ['A', 'B', 'C']
spam

[1, 2, 3, 'A', 'B', 'C']

spam = ['cat', 'bat', 'rat', 'elephant']


del spam[2]
spam

['cat', 'bat', 'elephant']

del spam[2]
spam

['cat', 'bat']

'howdy' in ['hello', 'hi', 'howdy', 'heyas']

True

spam = ['hello', 'hi', 'howdy', 'heyas']

'cat' in spam

False

'howdy' not in spam

False

'cat' not in spam

True

bacon = ['Zophie']
bacon *= 3
bacon

['Zophie', 'Zophie', 'Zophie']

spam = ['hello', 'hi', 'howdy', 'heyas']


spam.index('hello')

spam.index('hello')

0
spam = ['Zophie', 'Pooka', 'Fat-tail', 'Pooka']
spam.index('Pooka')

spam = ['cat', 'dog', 'bat']


spam.append('moose')
spam

['cat', 'dog', 'bat', 'moose']

spam = ['cat', 'dog', 'bat']


spam.insert(1, 'chicken')
spam

['cat', 'chicken', 'dog', 'bat']

spam = ['cat', 'bat', 'rat', 'elephant']


spam.remove('bat')
spam

['cat', 'rat', 'elephant']

spam = ['cat', 'bat', 'rat', 'elephant']


spam.remove('mouse')
spam

----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
Input In [244], in <cell line: 2>()
1 spam = ['cat', 'bat', 'rat', 'elephant']
----> 2 spam.remove('mouse')
3 spam

ValueError: list.remove(x): x not in list

spam = [2, 5, 3.14, 1, -7]


spam.sort()
spam

[-7, 1, 2, 3.14, 5]

spam = ['ants', 'cats', 'dogs', 'badgers', 'elephants']


spam.sort()
spam

['ants', 'badgers', 'cats', 'dogs', 'elephants']


spam = ['ants', 'cats', 'dogs', 'badgers', 'elephants']
spam.sort(reverse=True)
spam

['elephants', 'dogs', 'cats', 'badgers', 'ants']

spam = [1, 3, 2, 4, 'Alice', 'Bob']


spam.sort()

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [248], in <cell line: 2>()
1 spam = [1, 3, 2, 4, 'Alice', 'Bob']
----> 2 spam.sort()

TypeError: '<' not supported between instances of 'str' and 'int'

spam = ['Alice', 'ants', 'Bob', 'badgers', 'Carol', 'cats']


spam.sort()
spam

['Alice', 'Bob', 'Carol', 'ants', 'badgers', 'cats']

spam = ['a', 'z', 'A', 'Z']


spam.sort(key=str.lower)
spam

['a', 'A', 'z', 'Z']

Tuples
# Create a Python Tuple called animals
animals = ("bear", "dog", "giraffe", "goat", "leopard", "lion",
"penguin", "tiger")
animals

('bear', 'dog', 'giraffe', 'goat', 'leopard', 'lion', 'penguin',


'tiger')

type(animals)

tuple

# Accessing items in a list via their index position


animals[3]

'goat'
# Items can also be accessed by negative index
print(animals[-2])

penguin

print(animals[2:5])

('giraffe', 'goat', 'leopard')

animals[2:-2:2]

('giraffe', 'leopard')

# Tuples are immutable


animals[0]='kangaroo'

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [257], in <cell line: 2>()
1 # Tuples are immutable
----> 2 animals[0]='kangaroo'

TypeError: 'tuple' object does not support item assignment

# Looping through tuple


for x in animals:
print(x)

bear
dog
giraffe
goat
leopard
lion
penguin
tiger

print(len(animals))

# Joining tuples using + operartor


letters = ("a","b","c")
numbers = (1,2,3)
letters_numbers = letters + numbers
print(letters_numbers)

('a', 'b', 'c', 1, 2, 3)

animals
('bear', 'dog', 'giraffe', 'goat', 'leopard', 'lion', 'penguin',
'tiger')

# Remove items from a list: There are a few ways to remove items.
animals.pop(3)
# animals.remove("goat")
# del animals[3]

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [262], in <cell line: 2>()
1 # Remove items from a list: There are a few ways to remove
items.
----> 2 animals.pop(3)

AttributeError: 'tuple' object has no attribute 'pop'

animals.remove("goat")

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [263], in <cell line: 1>()
----> 1 animals.remove("goat")

AttributeError: 'tuple' object has no attribute 'remove'

del animals[3]

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [264], in <cell line: 1>()
----> 1 del animals[3]

TypeError: 'tuple' object doesn't support item deletion

animals

('bear', 'dog', 'giraffe', 'goat', 'leopard', 'lion', 'penguin',


'tiger')

animals[:2]+animals[3:] # Remove the tuple item say giraffe

('bear', 'dog', 'goat', 'leopard', 'lion', 'penguin', 'tiger')

animals[:4]+animals[6:] # Remove the tuple item leopard and lion


('bear', 'dog', 'giraffe', 'goat', 'penguin', 'tiger')

Animal_1 = list(animals)
Animal_1

['bear', 'dog', 'giraffe', 'goat', 'leopard', 'lion', 'penguin',


'tiger']

Animal_1.pop(3)

'goat'

Animal_1

['bear', 'dog', 'giraffe', 'leopard', 'lion', 'penguin', 'tiger']

Animal_2 = tuple(Animal_1)
Animal_2

('bear', 'dog', 'giraffe', 'leopard', 'lion', 'penguin', 'tiger')

tupSample = (1,2,3,4,3,'py') # tuple


print(tupSample)

(1, 2, 3, 4, 3, 'py')

tupleSample = 1,2,'sample' # tuple can be made


without open brackets called tuple packing
print(tupleSample)

(1, 2, 'sample')

tupSample = (1,2,3,4,3,'py') # tuple


print(tupSample)

(1, 2, 3, 4, 3, 'py')

Indexing

tupSample.index('py') # to find the position of


the element 'py'

tupSample[2] # to find the 3rd element


of the 'tupSample'

Slicing
tupSample = (1,2,3,4,3,'py') # tuple
tupSample

(1, 2, 3, 4, 3, 'py')

print(tupSample[:-1:2])

(1, 3, 3)

Addition

tupSample = (1,2,3,4,3,'py') # tuple


tupSample

(1, 2, 3, 4, 3, 'py')

tupSample+=('th','on')
print(tupSample)

(1, 2, 3, 4, 3, 'py', 'th', 'on')

tupSample+('th','on')
print(tupSample)

(1, 2, 3, 4, 3, 'py', 'th', 'on')

Multiplication

tupSample = (1,2,3,4,3,'py') # tuple


tupSample

(1, 2, 3, 4, 3, 'py')

tupSample*2

(1, 2, 3, 4, 3, 'py', 1, 2, 3, 4, 3, 'py')

tupSample[2:4]*2

(3, 4, 3, 4)

len(object)

tupSample = (1,2,3,4,3,'py') # tuple

len(tupleSample)

reverse
tupSample = (1,2,3,4,3,'py') # tuple
print(tupSample)

(1, 2, 3, 4, 3, 'py')

tupSample.reverse()
print(tupSample) # AttributeError: 'tuple' object
has no attribute 'reverse'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [288], in <cell line: 1>()
----> 1 tupSample.reverse()
2 print(tupSample)

AttributeError: 'tuple' object has no attribute 'reverse'

clear

tupSample.clear()
print(tupSample) # AttributeError: 'tuple'
object has no attribute 'clear'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [289], in <cell line: 1>()
----> 1 tupSample.clear()
2 print(tupSample)

AttributeError: 'tuple' object has no attribute 'clear'

append() or add()

tupSample.append((1,2))
print(tupSample) # AttributeError:
'tuple' object has no attribute 'append'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [290], in <cell line: 1>()
----> 1 tupSample.append((1,2))
2 print(tupSample)

AttributeError: 'tuple' object has no attribute 'append'


tupSample.add((1,2))
print(tupSample) # AttributeError: 'tuple' object
has no attribute 'add'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [291], in <cell line: 1>()
----> 1 tupSample.add((1,2))
2 print(tupSample)

AttributeError: 'tuple' object has no attribute 'add'

update()

tupSample = (1,2,3,4,3,'py') # tuple


print(tupSample)

(1, 2, 3, 4, 3, 'py')

tupSample.update((7,'8')) # AttributeError: 'tuple' object


has no attribute 'update'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [293], in <cell line: 1>()
----> 1 tupSample.update((7,'8'))

AttributeError: 'tuple' object has no attribute 'update'

s1 = set(tupSample)
print(s1)
s2 = set((7,'8'))
print(s2)
s1.update(s2)
print(s1)

{1, 2, 3, 4, 'py'}
{'8', 7}
{1, 2, 3, 4, 'py', 7, '8'}

insert()

tupSample = (1,2,3,4,3,'py') # tuple


print(tupSample)

(1, 2, 3, 4, 3, 'py')
strSample.insert(3,5)
print(strSample) # AttributeError: 'str'
object has no attribute 'insert'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [296], in <cell line: 1>()
----> 1 strSample.insert(3,5)
2 print(strSample)

AttributeError: 'str' object has no attribute 'insert'

pop()

tupSample = (1,2,3,4,3,'py') # tuple


print(tupSample)

(1, 2, 3, 4, 3, 'py')

tupSample.pop() # AttributeError: 'tuple'


object has no attribute 'pop'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [298], in <cell line: 1>()
----> 1 tupSample.pop()

AttributeError: 'tuple' object has no attribute 'pop'

tupSample.pop(2) # AttributeError: 'tuple'


object has no attribute 'pop'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [299], in <cell line: 1>()
----> 1 tupSample.pop(2)

AttributeError: 'tuple' object has no attribute 'pop'

remove()

tupSample = (1,2,3,4,3,'py') # tuple


print(tupSample)
(1, 2, 3, 4, 3, 'py')

tupSample.remove(2) # AttributeError: 'tuple'


object has no attribute 'remove'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [301], in <cell line: 1>()
----> 1 tupSample.remove(2)

AttributeError: 'tuple' object has no attribute 'remove'

del

del tupSample # deleting the tuple, tupSample


tupSample # NameError: name 'tupSample' is not
defined

----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [302], in <cell line: 2>()
1 del tupSample # deleting the tuple,
tupSample
----> 2 tupSample

NameError: name 'tupSample' is not defined

del tupSample[2] # deleting the 3rd item in the


tuple
tupSample # TypeError: 'tuple' object
doesn't support item deletion

----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [303], in <cell line: 1>()
----> 1 del tupSample[2] # deleting the 3rd item
in the tuple
2 tupSample

NameError: name 'tupSample' is not defined

extend()
# Practical question
import calendar as c
Year=int(input("Enter the year: "))
Month=int(input("Enter the month: "))
print(c.month(Year,Month))

Enter the year:

----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
Input In [304], in <cell line: 3>()
1 # Practical question
2 import calendar as c
----> 3 Year=int(input("Enter the year: "))
4 Month=int(input("Enter the month: "))
5 print(c.month(Year,Month))

ValueError: invalid literal for int() with base 10: ''

Dictionary
Dictionary - Sequence of key-value pairs - {}

dictSample = {1:'first','second':3, 3:3, 'four':'4'}


# dictionary
dictSample

{1: 'first', 'second': 3, 3: 3, 'four': '4'}

dictSample[2] # KeyError: 2 -
indexing by values is not present in dictionary

----------------------------------------------------------------------
-----
KeyError Traceback (most recent call
last)
Input In [306], in <cell line: 1>()
----> 1 dictSample[2]

KeyError: 2

dictSample[1]

'first'

dictSample['second'] # to find
the value corresponds to key 'second'
3

dictSample.index(1) # 'dict' object


has no attribute 'index'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [309], in <cell line: 1>()
----> 1 dictSample.index(1)

AttributeError: 'dict' object has no attribute 'index'

for x in dictSample:
print(x)

1
second
3
four

for x in dictSample.keys():
print(x)

1
second
3
four

for x in dictSample.values():
print(x)

first
3
3
4

for x in dictSample.items():
print(x)

(1, 'first')
('second', 3)
(3, 3)
('four', '4')

dictSample = {1:'first','second':3, 3:3, 'four':'4'}


# dictionary
# dictSample = {1:'first',2:'second', 3:3, 'four':'4'}
dictSample

{1: 'first', 'second': 3, 3: 3, 'four': '4'}


dictSample[1]

'first'

dictSample[1::2]

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [316], in <cell line: 1>()
----> 1 dictSample[1::2]

TypeError: unhashable type: 'slice'

dictSample[1:'second'] #
TypeError: unhashable type: 'slice'

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [317], in <cell line: 1>()
----> 1 dictSample[1:'second']

TypeError: unhashable type: 'slice'

dictSample[1:] #
TypeError: unhashable type: 'slice'

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [318], in <cell line: 1>()
----> 1 dictSample[1:]

TypeError: unhashable type: 'slice'

dictSample['four']

'4'

dictSample+{5:3} # TypeError: unsupported


operand type(s) for +: 'dict' and 'dict'

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [320], in <cell line: 1>()
----> 1 dictSample+{5:3}
TypeError: unsupported operand type(s) for +: 'dict' and 'dict'

dictSample,{5:3},{3:4}

({1: 'first', 'second': 3, 3: 3, 'four': '4'}, {5: 3}, {3: 4})

dictSample,2

({1: 'first', 'second': 3, 3: 3, 'four': '4'}, 2)

dictSample,(1)

({1: 'first', 'second': 3, 3: 3, 'four': '4'}, 1)

dictSample,[2]

({1: 'first', 'second': 3, 3: 3, 'four': '4'}, [2])

dictSample,range(2,7,2)

({1: 'first', 'second': 3, 3: 3, 'four': '4'}, range(2, 7, 2))

dictSample,{1,2}

({1: 'first', 'second': 3, 3: 3, 'four': '4'}, {1, 2})

dictSample*2 # TypeError: unsupported operand


type(s) for *: 'dict' and 'int'

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [327], in <cell line: 1>()
----> 1 dictSample*2

TypeError: unsupported operand type(s) for *: 'dict' and 'int'

reverse

dictSample.reverse() # AttributeError: 'dict'


object has no attribute 'reverse'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [328], in <cell line: 1>()
----> 1 dictSample.reverse()

AttributeError: 'dict' object has no attribute 'reverse'


clear

dictSample = {1:'first','second':2, 3:3, 'four':4} #


dictionary
print(dictSample)

{1: 'first', 'second': 2, 3: 3, 'four': 4}

dictSample.clear()
print(dictSample)

{}

append/add

dictSample.append({5:5}) # AttributeError: 'dict'


object has no attribute 'append'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [331], in <cell line: 1>()
----> 1 dictSample.append({5:5})

AttributeError: 'dict' object has no attribute 'append'

dictSample.add({5:5}) # AttributeError: 'dict'


object has no attribute 'add'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [332], in <cell line: 1>()
----> 1 dictSample.add({5:5})

AttributeError: 'dict' object has no attribute 'add'

update

dictSample = {1:'first','second':2, 3:3, 'four':4} #


dictionary
print(dictSample)

{1: 'first', 'second': 2, 3: 3, 'four': 4}

dictSample.update({7:"John"})
print(dictSample)

{1: 'first', 'second': 2, 3: 3, 'four': 4, 7: 'John'}


Dictionary Methods

dictSample = {1:'first','second':2, 3:3, 'four':4} #


dictionary
print(dictSample)

{1: 'first', 'second': 2, 3: 3, 'four': 4}

dictSample['second']=9
dictSample

{1: 'first', 'second': 9, 3: 3, 'four': 4}

dictSample["five"] = 5
print(dictSample)

{1: 'first', 'second': 9, 3: 3, 'four': 4, 'five': 5}

dictSample.update(five=5) # update the dictionary with the


key/value pairs from other
print(dictSample) # overwriting existing keys, and
updated dictionary

{1: 'first', 'second': 9, 3: 3, 'four': 4, 'five': 5}

dictSample.update(five=6) # update the dictionary with the


key/value pairs from other
print(dictSample) # overwriting existing keys, and
updated dictionary

{1: 'first', 'second': 9, 3: 3, 'four': 4, 'five': 6}

dictSample.update(second='hi') # update the dictionary with


the key/value pairs from other
print(dictSample) # overwriting existing keys, and
updated dictionary

{1: 'first', 'second': 'hi', 3: 3, 'four': 4, 'five': 6}

dictSample.update(1=5)
print(dictSample) # SyntaxError: expression cannot
contain assignment, perhaps you meant "=="?

Input In [341]
dictSample.update(1=5)
^
SyntaxError: expression cannot contain assignment, perhaps you meant
"=="?

list(dictSample) # returns a list of all the keys used in


the dictionary dictSample
[1, 'second', 3, 'four', 'five']

set(dictSample) # returns a set of all the keys used in


the dictionary dictSample

{1, 3, 'five', 'four', 'second'}

str(dictSample) # returns a string of all the keys used in


the dictionary dictSample

"{1: 'first', 'second': 'hi', 3: 3, 'four': 4, 'five': 6}"

tuple(dictSample) # returns a tuple of all the keys used in


the dictionary dictSample

(1, 'second', 3, 'four', 'five')

len

dictSample = {1:'first','second':2, 3:3, 'four':4} #


dictionary
print(dictSample)

{1: 'first', 'second': 2, 3: 3, 'four': 4}

len(dictSample) # returns the number of items


in the dictionary

dictSample.get("five") # it is a conventional method


to access a value for a key

dictSample.get(3)

dictSample.keys() # returns list of keys in


dictionary

dict_keys([1, 'second', 3, 'four'])

dictSample.values() # returns list of values in


dictionary

dict_values(['first', 2, 3, 4])

dictSample.items() # returns a list of (key,value)


tuple pairs

dict_items([(1, 'first'), ('second', 2), (3, 3), ('four', 4)])

insert
dictSample = {1:'first','second':2, 3:3, 'four':4} #
dictionary
print(dictSample)

{1: 'first', 'second': 2, 3: 3, 'four': 4}

dictSample.insert(3,5)
printi(dictSample) # AttributeError: 'dict'
object has no attribute 'insert'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [354], in <cell line: 1>()
----> 1 dictSample.insert(3,5)
2 printi(dictSample)

AttributeError: 'dict' object has no attribute 'insert'

pop

dictSample = {1:'first','second':2, 3:3, 'four':4} #


dictionary
print(dictSample)

{1: 'first', 'second': 2, 3: 3, 'four': 4}

dictSample.pop() # TypeError: pop expected at


least 1 argument, got 0

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [356], in <cell line: 1>()
----> 1 dictSample.pop()

TypeError: pop expected at least 1 argument, got 0

dictSample.pop('second')

print(dictSample)

{1: 'first', 3: 3, 'four': 4}

dictSample.pop(3)

3
print(dictSample)

{1: 'first', 'four': 4}

remove

dictSample = {1:'first','second':2, 3:3, 'four':4} #


dictionary
print(dictSample)

{1: 'first', 'second': 2, 3: 3, 'four': 4}

dictSample.remove(1) # AttributeError: 'dict'


object has no attribute 'remove'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [362], in <cell line: 1>()
----> 1 dictSample.remove(1)

AttributeError: 'dict' object has no attribute 'remove'

del

dictSample = {1:'first','second':2, 3:3, 'four':4} #


dictionary
print(dictSample)

{1: 'first', 'second': 2, 3: 3, 'four': 4}

del dictSample # deleting the dictionary,


dictSample
dictSample # NameError: name 'dictSample' is
not defined

----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [364], in <cell line: 2>()
1 del dictSample # deleting the dictionary,
dictSample
----> 2 dictSample

NameError: name 'dictSample' is not defined

dictSample = {1:'first','second':2, 3:3, 'four':4} #


dictionary
print(dictSample)
{1: 'first', 'second': 2, 3: 3, 'four': 4}

del dictSample[3] # deleting the dictionary


dictSample

{1: 'first', 'second': 2, 'four': 4}

Try for the exercise

# Create a dictionary called countries_cities


countries_cities = {
"United Kingdom" : "London",
"United States" : "New York",
"Belgium" : "Brussels",
"Croatia" : "Zagreb",
"Egypt" : "Cairo",
"Italy" : "Rome",

}
# Keys : Values

countries_cities

{'United Kingdom': 'London',


'United States': 'New York',
'Belgium': 'Brussels',
'Croatia': 'Zagreb',
'Egypt': 'Cairo',
'Italy': 'Rome'}

# Return the value of specified key


countries_cities["Italy"]

'Rome'

countries_cities["Rome"]

----------------------------------------------------------------------
-----
KeyError Traceback (most recent call
last)
Input In [370], in <cell line: 1>()
----> 1 countries_cities["Rome"]

KeyError: 'Rome'

# Returns the value of specified key


countries_cities.get("Egypt")

'Cairo'
# Change value in a dictionary
countries_cities["United States"]="Atlanta"

countries_cities

{'United Kingdom': 'London',


'United States': 'Atlanta',
'Belgium': 'Brussels',
'Croatia': 'Zagreb',
'Egypt': 'Cairo',
'Italy': 'Rome'}

# Add item to dictionary


countries_cities["Japan"]="Tokyo"

# Add item to dictionary


countries_cities["Japan"]="Tokyo"

countries_cities

{'United Kingdom': 'London',


'United States': 'Atlanta',
'Belgium': 'Brussels',
'Croatia': 'Zagreb',
'Egypt': 'Cairo',
'Italy': 'Rome',
'Japan': 'Tokyo'}

print(countries_cities)

{'United Kingdom': 'London', 'United States': 'Atlanta', 'Belgium':


'Brussels', 'Croatia': 'Zagreb', 'Egypt': 'Cairo', 'Italy': 'Rome',
'Japan': 'Tokyo'}

# Remove items from dictionary: There are several ways to do this


countries_cities.pop("Japan") # remove items with specified key name
print(countries_cities)

{'United Kingdom': 'London', 'United States': 'Atlanta', 'Belgium':


'Brussels', 'Croatia': 'Zagreb', 'Egypt': 'Cairo', 'Italy': 'Rome'}

del countries_cities["Egypt"]

countries_cities

{'United Kingdom': 'London',


'United States': 'Atlanta',
'Belgium': 'Brussels',
'Croatia': 'Zagreb',
'Italy': 'Rome'}
# Remove the last inserted item
countries_cities.popitem()

('Italy', 'Rome')

countries_cities

{'United Kingdom': 'London',


'United States': 'Atlanta',
'Belgium': 'Brussels',
'Croatia': 'Zagreb'}

countries_cities = {
"United Kingdom" : "London",
"United States" : "New York",
"Belgium" : "Brussels",
"Croatia" : "Zagreb",
"Egypt" : "Cairo",
"Italy" : "Rome",

countries_cities

{'United Kingdom': 'London',


'United States': 'New York',
'Belgium': 'Brussels',
'Croatia': 'Zagreb',
'Egypt': 'Cairo',
'Italy': 'Rome'}

# Looping through dictionaries keys


for x in countries_cities: # print just keys
print(x)

United Kingdom
United States
Belgium
Croatia
Egypt
Italy

for x in countries_cities.values(): # print just values


print(x)

London
New York
Brussels
Zagreb
Cairo
Rome
for x in countries_cities.items(): # print keys and values
print(x)

('United Kingdom', 'London')


('United States', 'New York')
('Belgium', 'Brussels')
('Croatia', 'Zagreb')
('Egypt', 'Cairo')
('Italy', 'Rome')

# Print length of dictionary


print(len(countries_cities))

# Nested dictionaries: A dictionary can also contain many


dictionaries, this is called nested dictionaries.
family={
"firstchild" : {
"name" : "Jack",
"year" : 2007
},
"secondchild" : {
"name" : "Lucy",
"year" : 2010
},
"thirdchild" : {
"name" : "Angela",
"year" : 2014
}
}

family

{'firstchild': {'name': 'Jack', 'year': 2007},


'secondchild': {'name': 'Lucy', 'year': 2010},
'thirdchild': {'name': 'Angela', 'year': 2014}}

print(family)

{'firstchild': {'name': 'Jack', 'year': 2007}, 'secondchild': {'name':


'Lucy', 'year': 2010}, 'thirdchild': {'name': 'Angela', 'year': 2014}}

family["secondchild"]

{'name': 'Lucy', 'year': 2010}

# Clear or empty a dictionary


family.clear()

family
{}

This is for the assignment

Question: Create tic-tac board with their corresponding keys using dictionary

theBoard = {'top-L': ' ', 'top-M': ' ', 'top-R': ' ',
'mid-L': ' ', 'mid-M': ' ', 'mid-R': ' ',
'low-L': ' ', 'low-M': ' ', 'low-R': ' '}
theBoard

{'top-L': ' ',


'top-M': ' ',
'top-R': ' ',
'mid-L': ' ',
'mid-M': ' ',
'mid-R': ' ',
'low-L': ' ',
'low-M': ' ',
'low-R': ' '}

theBoard = {'top-L': ' ', 'top-M': ' ', 'top-R': ' ',
'mid-L': ' ', 'mid-M': 'X', 'mid-R': ' ',
'low-L': ' ', 'low-M': ' ', 'low-R': ' '}
theBoard

{'top-L': ' ',


'top-M': ' ',
'top-R': ' ',
'mid-L': ' ',
'mid-M': 'X',
'mid-R': ' ',
'low-L': ' ',
'low-M': ' ',
'low-R': ' '}

theBoard = {'top-L': 'O', 'top-M': 'O', 'top-R': 'O',


'mid-L': 'X', 'mid-M': 'X', 'mid-R': ' ',
'low-L': ' ', 'low-M': ' ', 'low-R': 'X'}
theBoard

{'top-L': 'O',
'top-M': 'O',
'top-R': 'O',
'mid-L': 'X',
'mid-M': 'X',
'mid-R': ' ',
'low-L': ' ',
'low-M': ' ',
'low-R': 'X'}
theBoard = {'top-L': ' ', 'top-M': ' ', 'top-R': ' ',
'mid-L': ' ', 'mid-M': ' ', 'mid-R': ' ',
'low-L': ' ', 'low-M': ' ', 'low-R': ' '}
def printBoard(board):
print(board['top-L'] + '|' + board['top-M'] + '|' + board['top-
R'])
print('-+-+-')
print(board['mid-L'] + '|' + board['mid-M'] + '|' + board['mid-
R'])
print('-+-+-')
print(board['low-L'] + '|' + board['low-M'] + '|' + board['low-
R'])
printBoard(theBoard)

| |
-+-+-
| |
-+-+-
| |

theBoard = {'top-L': 'O', 'top-M': 'O', 'top-R': 'O', 'mid-L': 'X',


'mid-M':
'X', 'mid-R': ' ', 'low-L': ' ', 'low-M': ' ', 'low-R': 'X'}
def printBoard(board):
print(board['top-L'] + '|' + board['top-M'] + '|' + board['top-
R'])
print('-+-+-')
print(board['mid-L'] + '|' + board['mid-M'] + '|' + board['mid-
R'])
print('-+-+-')
print(board['low-L'] + '|' + board['low-M'] + '|' + board['low-
R'])
printBoard(theBoard)

O|O|O
-+-+-
X|X|
-+-+-
| |X

Set
Sets - Sequence of unordered collection of unique data

setSample = {'example',24,87.5,'data',24, 'data'} # set


setSample

{24, 87.5, 'data', 'example'}


set('example')

{'a', 'e', 'l', 'm', 'p', 'x'}

setSample[1]

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [402], in <cell line: 1>()
----> 1 setSample[1]

TypeError: 'set' object is not subscriptable

setSample.index("example")
# TypeError: 'set'object does not support indexing

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [403], in <cell line: 1>()
----> 1 setSample.index("example")

AttributeError: 'set' object has no attribute 'index'

setSample[1:2] #
TypeError: 'set' object is not subscriptable

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [404], in <cell line: 1>()
----> 1 setSample[1:2]

TypeError: 'set' object is not subscriptable

setSample=setSample,24 # Converts to
tuple with comma separated elements of set, dict, range
print(setSample)

({24, 'data', 'example', 87.5}, 24)

setSample=setSample,{1:2} # Converts to
tuple with comma separated elements of set, dict, range
print(setSample)

(({24, 'data', 'example', 87.5}, 24), {1: 2})


setSample=setSample,[1] # Converts to tuple
with comma separated elements of set, dict, range
print(setSample)

((({24, 'data', 'example', 87.5}, 24), {1: 2}), [1])

setSample,{33}

(((({24, 87.5, 'data', 'example'}, 24), {1: 2}), [1]), {33})

setSample,range(1,10,2)

(((({24, 87.5, 'data', 'example'}, 24), {1: 2}), [1]), range(1, 10,
2))

dictSample = {1:'first','second':3, 3:3, 'four':'4'}


dictSample

{1: 'first', 'second': 3, 3: 3, 'four': '4'}

dictSample+{5:3} # TypeError: unsupported


operand type(s) for +: 'dict' and 'dict'

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [411], in <cell line: 1>()
----> 1 dictSample+{5:3}

TypeError: unsupported operand type(s) for +: 'dict' and 'dict'

dictSample,{5:3}

({1: 'first', 'second': 3, 3: 3, 'four': '4'}, {5: 3})

dictSample,2

({1: 'first', 'second': 3, 3: 3, 'four': '4'}, 2)

dictSample,(1)

({1: 'first', 'second': 3, 3: 3, 'four': '4'}, 1)

dictSample,[2]

({1: 'first', 'second': 3, 3: 3, 'four': '4'}, [2])

dictSample,range(2,7,2)

({1: 'first', 'second': 3, 3: 3, 'four': '4'}, range(2, 7, 2))

Multiplication
setSample = {'example',24,87.5,'data',24,'data'}
# sets
setSample

{24, 87.5, 'data', 'example'}

setSample*2 # TypeError: unsupported


operand type(s) for *: 'set' and 'int'

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [418], in <cell line: 1>()
----> 1 setSample*2

TypeError: unsupported operand type(s) for *: 'set' and 'int'

len

setSample = {'example',24,87.5,'data',24,'data'} # set

len(setSample)

reverse

setSample.reverse() #
AttributeError: 'set' object has no attribute 'reverse'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [421], in <cell line: 1>()
----> 1 setSample.reverse()

AttributeError: 'set' object has no attribute 'reverse'

clear

setSample = {'example',24,87.5,'data',24,'data'} # set

setSample.clear()
print(setSample)

set()

append() or add()
setSample.append(20)
print(setSample) # AttributeError: 'set'
object has no attribute 'append'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [424], in <cell line: 1>()
----> 1 setSample.append(20)
2 print(setSample)

AttributeError: 'set' object has no attribute 'append'

setSample = {'example',24,87.5,'data',24,'data'} # set


setSample

{24, 87.5, 'data', 'example'}

setSample.add(20)
print(setSample)

{20, 'example', 87.5, 24, 'data'}

update

strSample = 'learning is fun !'

ss = set(strSample)
print(ss)
l = set('John')
print(l)
ss.update(l)
print(ss)

{'!', 'e', 'r', 'u', 'n', 's', ' ', 'g', 'f', 'i', 'l', 'a'}
{'n', 'o', 'J', 'h'}
{'!', 'e', 'r', 'J', 'h', 'u', 'n', 's', ' ', 'g', 'f', 'i', 'l', 'o',
'a'}

ss = set(strSample)
print(ss)
l = set('john')
print(l)
ss.update(l)
print(ss)

{'!', 'e', 'r', 'u', 'n', 's', ' ', 'g', 'f', 'i', 'l', 'a'}
{'n', 'j', 'o', 'h'}
{'!', 'e', 'r', 'h', 'u', 'n', 's', ' ', 'j', 'g', 'f', 'i', 'l', 'o',
'a'}
lstSample = [1,2,'a','sam',2]

# So we need to convert list into set and then pass update function
l=[4,'3']
s = set(l)
l1 = set(lstSample)
l1.update(s)
print(l1)

{1, 2, 4, '3', 'sam', 'a'}

tupSample = (1,2,3,4,3,'py')

s1 = set(tupSample)
print(s1)
s2 = set((7,'8'))
print(s2)
s1.update(s2)
print(s1)

{1, 2, 3, 4, 'py'}
{'8', 7}
{1, 2, 3, 4, 'py', 7, '8'}

dictSample = {1:'first','second':2, 3:3, 'four':4}

dictSample.update({7:"John"})
print(dictSample)

{1: 'first', 'second': 2, 3: 3, 'four': 4, 7: 'John'}

setSample = {'example',24,87.5,'data',24,'data'}

setSample.update({7,"John"})
print(setSample)

{7, 'example', 87.5, 24, 'John', 'data'}

insert

setSample = {'example',24,87.5,'data',24,'data'} # set


print(setSample)

{24, 'data', 'example', 87.5}

setSample.insert(3,5)
print(setSample) # AttributeError: 'set'
object has no attribute 'insert'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [439], in <cell line: 1>()
----> 1 setSample.insert(3,5)
2 print(setSample)

AttributeError: 'set' object has no attribute 'insert'

pop

setSample = {'example',24,87.5,'data',24,'data'} # set


print(setSample)

{24, 'data', 'example', 87.5}

setSample.pop()

24

print(setSample)

{'data', 'example', 87.5}

setSample.pop(2) # Set is an unordered sequence


and hence pop is not usually used
print(setSample) # TypeError: set.pop() takes no
arguments (1 given)

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [443], in <cell line: 1>()
----> 1 setSample.pop(2) # Set is an unordered
sequence and hence pop is not usually used
2 print(setSample)

TypeError: set.pop() takes no arguments (1 given)

remove

setSample = {'example',24,87.5,'data',24,'data'} # set


print(setSample)

{24, 'data', 'example', 87.5}

setSample.remove('example')

setSample

{24, 87.5, 'data'}


del

setSample = {'example',24,87.5,'data',24,'data'} # set


print(setSample)

{24, 'data', 'example', 87.5}

del setSample # deleting the set, setSample


setSample # NameError: name 'setSample' is not
defined

----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [448], in <cell line: 2>()
1 del setSample # deleting the set,
setSample
----> 2 setSample

NameError: name 'setSample' is not defined

del setSample[2] # deleting the set, setSample


setSample # NameError: 'set' object doesn't
support item deletion

----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [449], in <cell line: 1>()
----> 1 del setSample[2] # deleting the set,
setSample
2 setSample

NameError: name 'setSample' is not defined

Set operations

A = {"example",24,87.5,'data',24,'data'} # set of mixed


data types
print(A)

{24, 'data', 'example', 87.5}

B = {24, 87.5} # set of


integers
print(B)

{24, 87.5}
print(A|B) # union of A and B is a set of all elements from
both sets
A.union(B) # using union() on B

{'example', 87.5, 24, 'data'}

{24, 87.5, 'data', 'example'}

print(A&B) # intersection of A and B is a set of


elements that are common in both sets
A.intersection(B) # using intersection() on B

{24, 87.5}

{24, 87.5}

Array, Range Function and Matrices


from array import * # importing array module

arrSample = array('i',[1,2,3,4]) # array


print(arrSample)

for x in arrSample:
print(x) # printing values of array

array('i', [1, 2, 3, 4])


1
2
3
4

rangeSample = range(1,12,4) #
built-in sequence type used for looping
print(rangeSample)

for x in rangeSample:
print(x) # print
the values of 'rangeSample'

range(1, 12, 4)
1
5
9

Indexing

arrSample
array('i', [1, 2, 3, 4])

arrSample.index(4)

arrSample[-3] # to find the 3rd last


element from 'strSample'

rangeSample

range(1, 12, 4)

rangeSample.index(0) # ValueError: 0 is not in


range

----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
Input In [460], in <cell line: 1>()
----> 1 rangeSample.index(0)

ValueError: 0 is not in range

rangeSample.index(9) # to find the index of


element 1

rangeSample[1] # gives the index, returns


the element

rangeSample[9] # IndexError: range object


index out of range

----------------------------------------------------------------------
-----
IndexError Traceback (most recent call
last)
Input In [463], in <cell line: 1>()
----> 1 rangeSample[9]

IndexError: range object index out of range

Slicing

arrSample[1:]
array('i', [2, 3, 4])

arrSample[1:-1]

array('i', [2, 3])

arrSample[1:-1:2]

array('i', [2])

for x in rangeSample[:-1]:
print(x)

1
5

for x in rangeSample[1:-1]:
print(x)

Concatenation

arrSample+[50,60] # TypeError: can only


append array (not "list") to array

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [469], in <cell line: 1>()
----> 1 arrSample+[50,60]

TypeError: can only append array (not "list") to array

arrSample+array('i',[50,60])

array('i', [1, 2, 3, 4, 50, 60])

range(2,7,2)+range(2,7,2)

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [471], in <cell line: 1>()
----> 1 range(2,7,2)+range(2,7,2)

TypeError: unsupported operand type(s) for +: 'range' and 'range'

Multiplication
from array import *
arrSample = array('i',[1,2,3,4]) # array with integer type
print(arrSample)

array('i', [1, 2, 3, 4])

arrSample*2

array('i', [1, 2, 3, 4, 1, 2, 3, 4])

rangeSample = range(1,12,4)
rangeSample

range(1, 12, 4)

rangeSample*2 # TypeError: unsupported


operand type(s) for *: 'range' and 'int'

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [475], in <cell line: 1>()
----> 1 rangeSample*2

TypeError: unsupported operand type(s) for *: 'range' and 'int'

len

from array import *


arrSample = array('i',[1,2,3,4]) # array with integer type
print(arrSample)

array('i', [1, 2, 3, 4])

len(arrSample)

rangeSample = range(1,12,4)
rangeSample

range(1, 12, 4)

len(rangeSample)

clear

arrSample.clear() # AttributeError:
'array.array' object has no attribute 'clear'
----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [480], in <cell line: 1>()
----> 1 arrSample.clear()

AttributeError: 'array.array' object has no attribute 'clear'

rangeSample.clear()
print(rangeSample) # AttributeError: 'range'
object has no attribute 'clear'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [481], in <cell line: 1>()
----> 1 rangeSample.clear()
2 print(rangeSample)

AttributeError: 'range' object has no attribute 'clear'

append() or add()

arrSample.add(3)
print(arrSample) # AttributeError: 'array.array'
object has no attribute 'add'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [482], in <cell line: 1>()
----> 1 arrSample.add(3)
2 print(arrSample)

AttributeError: 'array.array' object has no attribute 'add'

arrSample.append(3)
print(arrSample) # AttributeError: 'array.array'
object has no attribute 'add'

array('i', [1, 2, 3, 4, 3])

rangeSample.append(22) # AttributeError:
'range' object has no attribute 'append'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [484], in <cell line: 1>()
----> 1 rangeSample.append(22)

AttributeError: 'range' object has no attribute 'append'

rangeSample.add(22) # AttributeError: 'range'


object has no attribute 'add'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [485], in <cell line: 1>()
----> 1 rangeSample.add(22)

AttributeError: 'range' object has no attribute 'add'

Can we change list into tuple, tuple into string, list into dictionary, list into set, tuple into
dictionary, tuple into set, list into string, and vice-versa.

l1=[1,2,3,]
l1
# type(l1)

[1, 2, 3]

t1=(1,2,3,)
t1
# type(t1)

(1, 2, 3)

Can we take keys and values as string, tuple, list, set, dictionary?

dict1={'Hi':[1,2,3],(1,2):(1,2),5:{'End'},2:{1:2,3:3}}
dict1

{'Hi': [1, 2, 3], (1, 2): (1, 2), 5: {'End'}, 2: {1: 2, 3: 3}}

update

arrSample = array('i',[1,2,3,4])

arrSample.update(7) # AttributeError: 'array.array'


object has no attribute 'update'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [490], in <cell line: 1>()
----> 1 arrSample.update(7)

AttributeError: 'array.array' object has no attribute 'update'

s1 = set((7,6))
print(s1)
l2 = set(arrSample)
print(l2)
l2.update(s1) # AttributeError: 'array.array' object has
no attribute 'update'
print(l2)

{6, 7}
{1, 2, 3, 4}
{1, 2, 3, 4, 6, 7}

rangeSample.update(range(1,10,2))
print(rangeSample) # AttributeError: 'range'
object has no attribute 'update'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [492], in <cell line: 1>()
----> 1 rangeSample.update(range(1,10,2))
2 print(rangeSample)

AttributeError: 'range' object has no attribute 'update'

s1 = set(range(1,12,4))
print(s1)
s2 = set(range(1,20,3))
print(s2)
s1.update(s2)
print(s1)

{1, 5, 9}
{1, 4, 7, 10, 13, 16, 19}
{1, 4, 5, 7, 9, 10, 13, 16, 19}

insert

arrSample = array('i',[1,2,3,4]) # array


print(arrSample)

array('i', [1, 2, 3, 4])


arrSample.insert(1,100) # inserting the element 100
at 2nd position
print(arrSample)

array('i', [1, 100, 2, 3, 4])

arrSample.insert(4,20) # inserting the element 20 at


4th position
print(arrSample)

array('i', [1, 100, 2, 3, 20, 4])

rangeSample = range(1,12,4) # range function


print(rangeSample)

range(1, 12, 4)

rangeSample.insert(3,5)
printi(rangeSample) # AttributeError: 'range'
object has no attribute 'insert'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [498], in <cell line: 1>()
----> 1 rangeSample.insert(3,5)
2 printi(rangeSample)

AttributeError: 'range' object has no attribute 'insert'

pop

arrSample = array('i',[1,2,3,4]) # array


print(arrSample)

array('i', [1, 2, 3, 4])

arrSample.pop() # deleting the last element and prints


the same

arrSample

array('i', [1, 2, 3])

arrSample = array('i',[1,2,3,4]) # array


print(arrSample)

array('i', [1, 2, 3, 4])


arrSample.pop(2) # deleting the 3rd element from the
array

print(arrSample)

array('i', [1, 2, 4])

rangeSample = range(1,12,4) # range function


print(rangeSample)

range(1, 12, 4)

rangeSample.pop() # AttributeError: 'range' object has


no attribute 'pop'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [506], in <cell line: 1>()
----> 1 rangeSample.pop()

AttributeError: 'range' object has no attribute 'pop'

rangeSample.pop(2) # AttributeError: 'range' object


has no attribute 'pop'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [507], in <cell line: 1>()
----> 1 rangeSample.pop(2)

AttributeError: 'range' object has no attribute 'pop'

remove

arrSample = array('i',[1,2,3,4]) # array


print(arrSample)

array('i', [1, 2, 3, 4])

arrSample.remove(2) # removes the element "2" from


the list

arrSample

array('i', [1, 3, 4])


rangeSample = range(1,12,4) # range function
print(rangeSample)

range(1, 12, 4)

rangeSample.remove(4) # AttributeError: 'range'


object has no attribute 'remove'

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [512], in <cell line: 1>()
----> 1 rangeSample.remove(4)

AttributeError: 'range' object has no attribute 'remove'

del

arrSample = array('i',[1,2,3,4])
arrSample

array('i', [1, 2, 3, 4])

del arrSample # deleting the array, arrSample


arrSample # NameError: name 'arrSample' is not
defined

----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [514], in <cell line: 2>()
1 del arrSample # deleting the array,
arrSample
----> 2 arrSample

NameError: name 'arrSample' is not defined

arrSample = array('i',[1,2,3,4])
arrSample

array('i', [1, 2, 3, 4])

del arrSample[2] # deleting the 3rd item in the


array
arrSample

array('i', [1, 2, 4])

rangeSample = range(1,12,4)
rangeSample
range(1, 12, 4)

del rangeSample # deleting the range, rangeSample


rangeSample # NameError: name 'rangeSample' is
not defined

----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Input In [518], in <cell line: 2>()
1 del rangeSample # deleting the range,
rangeSample
----> 2 rangeSample

NameError: name 'rangeSample' is not defined

rangeSample = range(1,12,4)
rangeSample

range(1, 12, 4)

del rangeSample[2] # deleting the range,


rangeSample
rangeSample # NameError: 'range' object
doesn't support item deletion

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [520], in <cell line: 1>()
----> 1 del rangeSample[2] # deleting the range,
rangeSample
2 rangeSample

TypeError: 'range' object doesn't support item deletion

extend

from array import *


arrSample = array('i',[1,2,3,4])

arrSample.extend((4,5,3,5)) # add a tuple to the


arrSample array
print(arrSample)

array('i', [1, 2, 3, 4, 4, 5, 3, 5])


arrSample.extend([4,5,3,5]) # add a tuple to the
arrSample array
print(arrSample)

array('i', [1, 2, 3, 4, 4, 5, 3, 5, 4, 5, 3, 5])

arrSample.extend(('hello'))
print(arrSample) # TypeError: an integer is
required (got type str)

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [524], in <cell line: 1>()
----> 1 arrSample.extend(('hello'))
2 print(arrSample)

TypeError: an integer is required (got type str)

arrSample.extend(['john'])
print(arrSample) # TypeError: an integer is
required (got type str)

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [525], in <cell line: 1>()
----> 1 arrSample.extend(['john'])
2 print(arrSample)

TypeError: an integer is required (got type str)

arrSample = array('i',[1,2,3,4]) # array


print(arrSample)

array('i', [1, 2, 3, 4])

arrSample.fromlist([3,4]) # add values from a


list to an array
print(arrSample)

array('i', [1, 2, 3, 4, 3, 4])

arrSample.tolist() # to convert an array into an


ordinary list with the same items
print(arrSample)

array('i', [1, 2, 3, 4, 3, 4])


rangeSample = range(1,12,4)
rangeSample

range(1, 12, 4)

rangeSample.extend([4,5,3,5])
print(rangeSample)

----------------------------------------------------------------------
-----
AttributeError Traceback (most recent call
last)
Input In [530], in <cell line: 1>()
----> 1 rangeSample.extend([4,5,3,5])
2 print(rangeSample)

AttributeError: 'range' object has no attribute 'extend'

Matrices
m = [[4,1,2],[7,5,3],[9,6,9]]

[[4, 1, 2], [7, 5, 3], [9, 6, 9]]

for i in m:
print(i)

[4, 1, 2]
[7, 5, 3]
[9, 6, 9]

m = [[4,1,2],[7,5,3],[9,6,9]]
for i in m:
print(' '.join(str(i)))

[ 4 , 1 , 2 ]
[ 7 , 5 , 3 ]
[ 9 , 6 , 9 ]

m = [[4,8,9] for i in range(3)]


for i in m:
print(' '.join(str(i)))

[ 4 , 8 , 9 ]
[ 4 , 8 , 9 ]
[ 4 , 8 , 9 ]
lst = []
m = []
for i in range(0,3):
for j in range(0,3):
lst.append(0)
m.append(lst)
lst = []
for i in m:
print(' '.join(str(i)))

[ 0 , 0 , 0 ]
[ 0 , 0 , 0 ]
[ 0 , 0 , 0 ]

for i in range(3):
print(i)

0
1
2

for i in range(1,3):
print(i)

1
2

matrix = [[1, 2, 3, 4],


[5, 6, 7, 8],
[9, 10, 11, 12]]

print("Matrix =", matrix)

Matrix = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]

Row = int(input("Enter the number of rows:"))


Column = int(input("Enter the number of columns:"))

# Initialize matrix
matrix = []
print("Enter the entries row wise:")

# For user input


# A for loop for row entries
for row in range(Row):
a = []
# A for loop for column entries
for column in range(Column):
a.append(int(input()))
matrix.append(a)
# For printing the matrix
for row in range(Row):
for column in range(Column):
print(matrix[row][column], end=" ")
print()

Enter the number of rows:3


Enter the number of columns:4
Enter the entries row wise:
1
2
3
4
5
6
7
8
9
10
11
12
1 2 3 4
5 6 7 8
9 10 11 12

matrix = [[column for column in range(4)] for row in range(4)]

print(matrix)

[[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3]]

X = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

row = column = 1

X[row][column] = 11

print(X)

[[1, 2, 3], [4, 11, 6], [7, 8, 9]]

row = -2
column = -1

X[row][column] = 21

print(X)

[[1, 2, 3], [4, 11, 21], [7, 8, 9]]


print("Matrix at 1 row and 3 column=", X[0][2])
print("Matrix at 3 row and 3 column=", X[2][2])

Matrix at 1 row and 3 column= 3


Matrix at 3 row and 3 column= 9

# Program to add two matrices using nested loop


X = [[1, 2, 3],[4, 5, 6], [7, 8, 9]]
Y = [[9, 8, 7], [6, 5, 4], [3, 2, 1]]

result = [[0, 0, 0], [0, 0, 0], [0, 0, 0]]

# iterate through rows


for row in range(len(X)):

# iterate through columns


for column in range(len(X[0])):
result[row][column] = X[row][column]+ Y[row][column]

for r in result:
print(r)

[10, 10, 10]


[10, 10, 10]
[10, 10, 10]

Add_result = [[X[row][column] + Y[row][column]


for column in range(len(X[0]))]
for row in range(len(X))]
Sub_result = [[X[row][column] - Y[row][column]
for column in range(len(X[0]))]
for row in range(len(X))]

print("Matrix Addition")
for r in Add_result:
print(r)

print("\nMatrix Subtraction")
for r in Sub_result:
print(r)

Matrix Addition
[10, 10, 10]
[10, 10, 10]
[10, 10, 10]

Matrix Subtraction
[-8, -6, -4]
[-2, 0, 2]
[4, 6, 8]
rmatrix = [[0, 0, 0], [0, 0, 0], [0, 0, 0]]

for row in range(len(X)):


for column in range(len(X[0])):
rmatrix[row][column] = X[row][column] * Y[row][column]

print("Matrix Multiplication",)
for r in rmatrix:
print(r)

for i in range(len(X)):
for j in range(len(X[0])):
rmatrix[row][column] = X[row][column] // Y[row][column]

print("\nMatrix Division",)
for r in rmatrix:
print(r)

Matrix Multiplication
[9, 16, 21]
[24, 25, 24]
[21, 16, 9]

Matrix Division
[9, 16, 21]
[24, 25, 24]
[21, 16, 9]

# Transpose
X = [[9, 8, 7], [6, 5, 4], [3, 2, 1]]

result = [[0, 0, 0], [0, 0, 0], [0, 0, 0]]

# iterate through rows


for row in range(len(X)):
# iterate through columns
for column in range(len(X[0])):
result[column][row] = X[row][column]

for r in result:
print(r)

[9, 6, 3]
[8, 5, 2]
[7, 4, 1]
Python Operators
# Python Arithmetic Operators
x = 7
y = 5
print(x+y) # Addition Operator
print(x-y) # Subtraction Operator
print(x*y) # Multiplication Operator
print(x/y) # Division Operator
print(x%y) # Modulo Operator
print(x**y) # Exponent Operator
print(x//y) # Floor division
# a|b i.e., if a divides b then b = n*a + r; where n = a//b, and r = a
%b.

12
2
35
1.4
2
16807
1

# Python Assignment Operators


city="London"
print(city)

x = 7
x += 3
print(x)

x = 7
x -= 3
print(x)

x = 7
x -= 3
print(x)

x = 7
x *= 3
print(x)

x = 7
x /= 3
print(x)

x = 7
x %= 3
print(x)
London
10
4
4
21
2.3333333333333335
1

# Python Comparison Operators

x = 7
y = 3
print(x==y)

x = 7
y = 3
print(x!=y)

x = 7
y = 3
print(x>y)

x = 7
y = 3
print(x<y)

x = 7
y = 3
print(x>=y)

x = 7
y = 3
print(x<=y)

False
True
True
False
True
False

# Python Comparison Operators

x = 7
print(x>3 and x<10)

x = 7
print(x>3 or x<10)

x = 7
print(not(x>3 and x<10))
True
True
False

Conditional Statements
# if statement
a = 40
b = 80
if b>a:
print("b is greater than a")

b is greater than a

# elif statement
a = 40
b = 40
if b>a:
print("b is greater than a")
elif a==b:
print("a and b are equal")

a and b are equal

# else statement
a = 20
b = 10
if b>a:
print("b is greater than a")
elif a==b:
print("a and b are equal")
else:
print("a is greater than b")

a is greater than b

# Nested if statements
x=51
if x>10:
print("Above ten,")
if x>20:
print("Above 20!")
else:
print("but not above 20.")

Above ten,
Above 20!
Python Loops
# While loop
i=1
while i<7:
print(i)
i+=1

1
2
3
4
5
6

# The break statement: we can stop the loop even if the while
condition is true:
i=1
while i<7:
print(i)
if (i==4):
break
i+=1

1
2
3
4

# The continue statement: we can stop the current iteration, and


continue with the next:
i=0
while i<7:
i+=1
if i==4:
# break
# print(i)
continue
print(i)
# i+=1

# Note that the number 4 is missing in the result

1
2
3
5
6
7
# The continue statement: we can stop the current iteration, and
continue with the next:
i=0
while i<7:
i+=1
if i==4:
pass
print(i)

# Note that the number 4 is missing in the result

1
2
3
4
5
6
7

# For loops
fruits=["apple","banana","cherry","kiwi","oranges"]
for x in fruits:
print(x)

apple
banana
cherry
kiwi
oranges

for x in "strawberry":
print(x)

s
t
r
a
w
b
e
r
r
y

adj=["red","big","tasty"]
fruits=["apple","banana","cherry"]

for x in adj:
for y in fruits:
print(x,y)
red apple
red banana
red cherry
big apple
big banana
big cherry
tasty apple
tasty banana
tasty cherry

num = int(input('Enter a number: '))


if num > 1:
for i in range(2, num):
if(num % i) == 0:
print('Number is not Prime')
break
else:
print('Number is Prime')

Enter a number: 47
Number is Prime

sentence="the quick brown fox jumps over the lazy dog"


word_freq = {}
for word in sentence.split():
word_freq[word]=word_freq.get(word,0)+1
print(word_freq)

{'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 'jumps': 1, 'over': 1,


'lazy': 1, 'dog': 1}

ss=sentence.split()
print(ss)

['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy',


'dog']

sentence="the quick brown fox jumps over the lazy dog"


word_freq = dict.fromkeys(sentence.split(),0)
print(word_freq)
for word in sentence.split():
word_freq[word]+=1
print(word_freq)

{'the': 0, 'quick': 0, 'brown': 0, 'fox': 0, 'jumps': 0, 'over': 0,


'lazy': 0, 'dog': 0}
{'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 'jumps': 1, 'over': 1,
'lazy': 1, 'dog': 1}

sentence="the quick brown fox jumps over the lazy dog"


word_freq = {}
for word in sentence:
if word not in word_freq:
word_freq[word]=0
else:
word_freq[word]+=1
print(word_freq)

{'t': 1, 'h': 1, 'e': 2, ' ': 7, 'q': 0, 'u': 1, 'i': 0, 'c': 0, 'k':
0, 'b': 0, 'r': 1, 'o': 3, 'w': 0, 'n': 0, 'f': 0, 'x': 0, 'j': 0,
'm': 0, 'p': 0, 's': 0, 'v': 0, 'l': 0, 'a': 0, 'z': 0, 'y': 0, 'd':
0, 'g': 0}

sentence="the quick brown fox jumps over the lazy dog"


word_freq = {}
for word in sentence:
if word not in word_freq:
word_freq[word]=sentence.count(word)
print(word_freq)

{'t': 2, 'h': 2, 'e': 3, ' ': 8, 'q': 1, 'u': 2, 'i': 1, 'c': 1, 'k':
1, 'b': 1, 'r': 2, 'o': 4, 'w': 1, 'n': 1, 'f': 1, 'x': 1, 'j': 1,
'm': 1, 'p': 1, 's': 1, 'v': 1, 'l': 1, 'a': 1, 'z': 1, 'y': 1, 'd':
1, 'g': 1}

5%2

n=int(input("Enter a number: "))


l=[]
s=0
for i in range(1,n+1):
if n%i==0:
l.append(i)
s+=i
print("Divisors of n: ",l)
print("Count of divisors: ",len(l))
print("Sum of divisors: ",s)

Enter a number: 100


Divisors of n: [1, 2, 4, 5, 10, 20, 25, 50, 100]
Count of divisors: 9
Sum of divisors: 217

s=0
for i in range(1,11):
s=s+i
print(s)

55
s=0
l=[]
for i in range(1,11):
s=s+i
l.append(s)
print(l)

[1, 3, 6, 10, 15, 21, 28, 36, 45, 55]

## The frequency of character in the sentence


# arr = [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3]
arr = "The Sun Shines"
# arr = (1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3)
# arr = {1,1,1,2,2,2,2,3,3,3,3,3}
# arr = {1:3,2:4,3:5}
freq = {}
for i in arr:
if i in freq:
freq[i] += 1
else:
freq[i] = 1
print(freq)
for key, value in freq.items():
print(f"{key}: {value}")
print('The frequency of S in the sentence: ',freq['S'])

{'T': 1, 'h': 2, 'e': 2, ' ': 2, 'S': 2, 'u': 1, 'n': 2, 'i': 1, 's':
1}
T: 1
h: 2
e: 2
: 2
S: 2
u: 1
n: 2
i: 1
s: 1
The frequency of S in the sentence: 2

# GCD of 2 numbers
num1 = 36
num2 = 60
gcd = 1

for i in range(1, min(num1, num2)):


if num1 % i == 0 and num2 % i == 0:
gcd = i
print("GCD of", num1, "and", num2, "is", gcd)

GCD of 36 and 60 is 12
# Palindrome
str1 = input('Enter your number: ')
str2 = str1[::-1]
l1 = list(str1)
l2 = list(str2)
if l1==l2:
print("Palindrome number")
else:
print("Not a Palindrome number")

Enter your number: strw


Not a Palindrome number

string=input(("Enter a letter:"))
if(string==string[::-1]):
print("The letter is a palindrome")
else:
print("The letter is not a palindrome")

Enter a letter:12321
The letter is a palindrome

# using While Loop


number = int(input("Please Enter any Value: "))

reverse = 0
temp = number

while(temp > 0):


Reminder = temp % 10
reverse = (reverse * 10) + Reminder
temp = temp //10

print("Reverse of it is = %d" %reverse)

if(number == reverse):
print("%d is a Palindrome" %number)
else:
print("%d is not a Palindrome" %number)

Please Enter any Value: 12321


Reverse of it is = 12321
12321 is a Palindrome

Python Functions
A function accepts input arguments and produces an output by executing valid commands
present in the function
Function name and file names need not be the same

A file can have one or more function definitions

Functions are created using the command def and a colon with the statements to be executed
indented as a block

Since statements are not demarcated explicitly, It is essential to follow correct indentation
practices

Syntax:

def function_name(parameters): statements

print((lambda x:x*x)(12))

144

x=(lambda a,b:a*b)(4,5)
print(x)

20

n=4
print((lambda p:p+n)(7))

11

y=lambda p,q:(p*p)+(q*q)
print(y(5,6))

61

list=[1,2,3]
l=[]
for i in list: # range(len(list)):
x=lambda i:i*i*i
l.append(x(i))
print(l)

[1, 8, 27]

# Swap 2 numbers
# a=2
# b=3
def swap(a,b):
temp=a
a=b
b=temp
print("values after swapping:",a,b)

swap(23,25)
values after swapping: 25 23

a = 9
b = 10
print(a,b)
a,b = b,a
print(a,b)

9 10
10 9

# 5*1=5
# .
# .
# 5*10=50
def mult(num):
for i in range(1,11):
num1=num*i
print(num,"*",i,"=",num1)

print(mult(5))

5 * 1 = 5
5 * 2 = 10
5 * 3 = 15
5 * 4 = 20
5 * 5 = 25
5 * 6 = 30
5 * 7 = 35
5 * 8 = 40
5 * 9 = 45
5 * 10 = 50
None

# Prime no. (Doubt)

def prime(num):
if (num==1):
return False
elif (num==2):
return True
else:
for x in range(2,num):
if (num%x==0):
return False
return True

print(prime(4))

# for i in range(2,n/2,2):
# if n%i==0:
# print("Is not prime")
# i=i+1
# print("It is a prime")
# prime(4)

False

num=int(input("Enter a number : "))


if num>1:
for i in range(2,num):
if (num%i)==0:
print("No. is not prime")
break
else:
print("No. is prime")

Enter a number : 5
No. is prime
No. is prime
No. is prime

# Upper case and lower case of input strings


def ULCase(stg):
count1=0
count2=0
for i in range(len(stg)):

if (stg[i]==stg[i].lower()):
count1=count1+1
elif (stg[i]==stg[i].upper()):
count2=count2+1
print("The count of lower strings ", count1)
print("The count of upper strings ", count2)
ULCase("AbSc")

The count of lower strings 2


The count of upper strings 2

def uclc(a):
uc=''
lc=''
for i in range(len(a)):
if (a[i].isupper()==True):
uc=uc+a[i]
elif (a[i].islower()==True):
lc=lc+a[i]
print(len(uc))
print(len(lc))

uclc("ABCdefGHIjk")
6
5

def UP_LO(s):
d={"UPPER_CASE":0,"LOWER_CASE":0}
for c in s:
if c.isupper():
d["UPPER_CASE"]+=1
elif c.islower():
d["LOWER_CASE"]+=1
else:
pass
print("Original String:",s)
print("No. of Upper case Letters:",d["UPPER_CASE"])
print("No. of Lower case Letters:",d["LOWER_CASE"])

UP_LO('Merry Christmas and a Happy New Year 2023')

Original String: Merry Christmas and a Happy New Year 2023


No. of Upper case Letters: 5
No. of Lower case Letters: 25

# Write a program to reverse the user-defined strings using while loop


str1 = input("Enter a string : ")
str2 = ''
i = len(str1)
while i>0:
j=str1[i-1]
str2+=j
i-=1
print(str2)

Enter a number : strange


egnarts

# Write a program to reverse the user-defined strings using for loop


str1 = input("Enter a string : ")
str2 = ''
l=len(str1)
for i in range(-1,-l-1,-1):
j=str1[i]
str2+=j
print(str2)

Enter a string : strange


egnarts

# Write a program that finds the sum of digits of user-defined number


using for loop.
n=int(input("Enter your number: "))
b=n
s=0
while (b>0):
a=b%10
b=int(b/10)
s+=a
print("Sum of digits is: ",s)

Enter your number: 123


6

# Write a program that finds the sum of digits of user-defined number


in list using for loop.
n=input("Enter list of numbers separated by commas: ")
a=n.split(",")
a=[int(x) for x in a]
y=0
print("The list is ",a)
for x in a:
y=x+y
print(y)

Enter list of numbers separated by commas: 1,2,3,4,5


The list is [1, 2, 3, 4, 5]
15

# Write a program that finds the sum of digits of user-defined number


using for loop.
a=input("Enter your number: ")
c=len(a)
d=0
while c>0:
d=d+int(a[c-1])
c=c-1
print(d)

Enter your number: 12345


15

# Write a program to check given user-defined list is subset of


another user-defined list.
l1=list(input("Enter"))
print(l1)
l2=list(input("Enter"))
print(l2)
a=0
for i in range(0,len(l1)):
for j in range(0,len(l2)):
if l1[i]==l2[j]:
a=1
break
else:
a=0
if(a==1):
print("It is a substring")
else:
print("It is not a substring")

Enter1234
['1', '2', '3', '4']
Enter3456
['3', '4', '5', '6']
It is a substring

def checkSubset(list1, list2):


l1, l2 = list1[0], list2[0]
exist = True
for i in list2:
if i not in list1:
exist = False
return exist

# Driver Code
list1 = [[2, 3, 1], [4, 5], [6, 8]]
list2 = [[4, 5], [6, 8]]
print(checkSubset(list1, list2))

True

# list1 = [[2, 3, 1], [4, 5], [6, 8]]


# list2 = [[4, 5], [6, 8]]
list1=list(input("Enter"))
# print(l1)
list2=list(input("Enter"))
# print(l2)
l1, l2 = list1[0], list2[0]
a = True
for i in list2:
if i not in list1:
a = False
print(a)

Enter1234
Entersd3
False

list2[0][1]

a=list(input("Enter your string: "))


i=0
l=[]
while (i<len(a)):
n=1
for j in range(i+1,len(a)):
if a[i]==a[j]:
n+=1
if a[i] not in l:
l.append(a[i])
print(a[i],'occurred',n,'times')
i+=1

Enter your string: allahabad


a occurred 4 times
l occurred 2 times
h occurred 1 times
b occurred 1 times
d occurred 1 times

a=input("Enter your string: ")


f={}

for i in a:
if i in f:
f[i]+=1
else:
f[i]=1
print(str(f))

Enter your string: ashok


{'a': 1, 's': 1, 'h': 1, 'o': 1, 'k': 1}

n=int(input("Enter your number: "))


a0=1
a1=1
print(a0)
print(a1)
for i in range(1,n+1):
a2=a0+a1
a0=a1
a1=a2
print(a2)

Enter your number: 4


1
1
2
3
5
8

# Fibonacci Function
def fib(n):
if n<0:
return("Fibonacci sequence don't exist for negative numbers. \
nPlease provide a positive number. ")
elif n==0:
return 1
elif n==1:
return [1,1]
else:
a0=1
a1=1
l=[a0,a1]
for i in range(1,n):
a2=a0+a1
a0=a1
a1=a2
l.append(a2)
return l

n=int(input("Enter your number: "))


print(fib(n))

Enter your number: 2


[1, 1, 2]

# Fibonacci Function
def fib(n):
if n<0:
return("Fibonacci sequence don't exist for negative numbers. \
nPlease provide a positive number. ")
elif n==0:
return 1
elif n==1:
return [1,1]
else:
a0=1
a1=1
l=[a0,a1]
i=2
while (i<=n):
a2=a0+a1
a0=a1
a1=a2
i+=1
l.append(a2)
return l

n=int(input("Enter your number: "))


print(fib(n))
Enter your number: 12
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233]

def palindrome(str1):
# str1=input("Enter your text: ")
# str1=str(str1)
l=len(str1)
str2=''
while l>0:
j=str1[l-1]
str2+=j
l-=1
return str2
if str1==str2:
return("Palindrome number")
else:
return("Not Palindrome number")
n=input("Enter your text: ")
print(palindrome(n))

Enter your text: jiko


okij

str1=input("Enter your text: ")


l=len(str1)
str2=''
while l>0:
j=str1[l-1]
str2+=j
l-=1
print('Reversed text: ',str2)
if str1==str2:
print("Palindrome number")
else:
print("Not Palindrome number")

Enter your text: 12321


Reversed text: 12321
Palindrome number

n=int(input("Enter :"))
n1,n2=1,1
count=0
if n<=0:
print("Enter a positive number.")
elif n==1:
print("Fibonacci sequence upto",n,":")
print(n1)
else:
print("Fibonacci sequence :")
while count<=n:
print(n1)
nth=n1+n2
n1=n2
n2=nth
count+=1

Enter :5
Fibonacci sequence :
1
1
2
3
5
8

vcount = 0
ccount = 0
str = "This is a really simple sentence"
vowel = set("aeiouAEIOU")
#Converting entire string to lower case to reduce the comparisons
# str = str.lower();
for i in range(0,len(str)):
#Checks whether a character is a vowel
if str[i] in ('a',"e","i","o","u"):
vcount = vcount + 1;
elif (str[i] >= 'a' and str[i] <= 'z'):
ccount = ccount + 1;
print("Total number of vowel and consonant are" );
print(vcount);
print(ccount);

Total number of vowel and consonant are


10
16

str = "This is a really simple sentence"


vowel = set("aeiouAEIOU")
count = 0
ccount = 0
for alphabet in str:
if alphabet in vowel:
count = count + 1
else:
ccount = ccount + 1
print("No. of vowels :", count)
print("No. of consonants :", ccount)

No. of vowels : 10
No. of consonants : 22
my_dict = {'apple': 10, 'banana': 5, 'cherry': 15}
def sorted_values(d):
return sorted(d.values(), reverse=True)

sorted_value_list = sorted_values(my_dict)
sorted_value_list

[15, 10, 5]

my_dict = {'apple': 10, 'banana': 5, 'cherry': 15}


def sorted_keys(d):
return sorted(d.keys())

sorted_key_list = sorted_keys(my_dict)
sorted_key_list

['apple', 'banana', 'cherry']

def string_lengths(strings):
return [len(s) for s in strings]

words = ["apple", "banana", "cherry"]


lengths = string_lengths(words)
lengths

[5, 6, 6]

Functions
A function is a block of code which only runs when it is called.

You can pass data, known as parameters, into a function.

A function can return data as a result.

In Python a function is defined using the def keyword.

def my_function():
print("Hello from a function")

my_function()

Hello from a function

Parameters

A parameter is the variable listed inside the parentheses in the function definition. For example:
a and b

Arguments
An argument is the value that is sent to the function when it is called. For example: 23 and 45.

Information can be passed into functions as arguments.

Arguments are specified after the function name, inside the parentheses. You can add as many
arguments as you want, just separate them with a comma.

The following example has a function with one argument (fname). When the function is called,
we pass along a first name, which is used inside the function to print the full name:

def my_function(fname):
print(fname + " Refsnes")

my_function("Emil")
my_function("Tobias")
my_function("Linus")

Emil Refsnes
Tobias Refsnes
Linus Refsnes

Number of Arguments

By default, a function must be called with the correct number of arguments. Meaning that if your
function expects 2 arguments, you have to call the function with 2 arguments, not more, and not
less.

def my_function(fname, lname):


print(fname + " " + lname)

my_function("Emil", "Refsnes")

Emil Refsnes

If you try to call the function with 1 or 3 arguments, you will get an error:

def my_function(fname, lname):


print(fname + " " + lname)

my_function("Emil")

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [5], in <cell line: 4>()
1 def my_function(fname, lname):
2 print(fname + " " + lname)
----> 4 my_function("Emil")
TypeError: my_function() missing 1 required positional argument:
'lname'

def my_function(fname, lname):


print(fname + " " + lname)

my_function("Emil","Ajay",'Anil')

----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
Input In [7], in <cell line: 4>()
1 def my_function(fname, lname):
2 print(fname + " " + lname)
----> 4 my_function("Emil","Ajay",'Anil')

TypeError: my_function() takes 2 positional arguments but 3 were given

Arbitrary Arguments, *args

If you do not know how many arguments that will be passed into your function, add a * before
the parameter name in the function definition.

This way the function will receive a tuple of arguments, and can access the items accordingly:

def my_function(*kids):
print("The youngest child is " + kids[2])

my_function("Emil", "Tobias", "Linus")

The youngest child is Linus

Keyword Arguments

You can also send arguments with the key = value syntax.

This way the order of the arguments does not matter.

def my_function(child3, child2, child1):


print("The youngest child is " + child3)

my_function(child1 = "Emil", child2 = "Tobias", child3 = "Linus")

The youngest child is Linus

Arbitrary Keyword Arguments, **kwargs

If you do not know how many keyword arguments that will be passed into your function, add
two asterisk: ** before the parameter name in the function definition.
This way the function will receive a dictionary of arguments, and can access the items
accordingly:

def my_function(**kid):
print("His last name is " + kid["lname"])

my_function(fname = "Tobias", lname = "Refsnes")

His last name is Refsnes

Default Parameter Value

The following example shows how to use a default parameter value.

If we call the function without argument, it uses the default value:

def my_function(country = "Norway"):


print("I am from " + country)

my_function("Sweden")
my_function("India")
my_function()
my_function("Brazil")

I am from Sweden
I am from India
I am from Norway
I am from Brazil

Passing a List as an Argument

You can send any data types of argument to a function (string, number, list, dictionary etc.), and
it will be treated as the same data type inside the function.

E.g. if you send a List as an argument, it will still be a List when it reaches the function:

def my_function(food):
for x in food:
print(x)

fruits = ["apple", "banana", "cherry"]

my_function(fruits)

apple
banana
cherry

Return Values

To let a function return a value, use the return statement:


def my_function(x):
return 5 * x

print(my_function(3))
print(my_function(5))
print(my_function(9))

15
25
45

The pass Statement

function definitions cannot be empty, but if you for some reason have a function definition with
no content, put in the pass statement to avoid getting an error.

def myfunction():
pass
myfunction()
# myfunction('a')

PYTHON FUNCTIONS-TYPES

● Built-in Functions.

● Recursion Functions.

● Lambda Functions.

● User-defined Functions.

PYTHON Built-in Function

● type()

● print()

● abs()

● int()

● str()

● tuple()

● chr()

PYTHON Recursion Function

Recursion is a common mathematical and programming concept. It means that a function calls
itself.
def factorial(x):
if x==1:
return 1
else:
return (x*factorial(x-1))
num=int(input("Enter a number: "))
print("The factorial of",num,'is',factorial(num))

Enter a number: 4
The factorial of 4 is 24

PYTHON Lambda Function

A lambda function is a small anonymous function.

A lambda function can take any number of arguments, but can only have one expression.

Syntax:

lambda arguments : expression

x = lambda a : a + 10
print(x(5))

15

x = lambda a, b : a * b
print(x(5, 6))

30

x = lambda a, b, c : a + b + c
print(x(5, 6, 2))

13

def myfunc(n):
return lambda a : a * n

mydoubler = myfunc(2)

print(mydoubler(11))

22

def myfunc(n):
return lambda a : a * n

mydoubler = myfunc(2)
mytripler = myfunc(3)
print(mydoubler(11))
print(mytripler(11))

22
33

Module & Library


A module is a collection of code or functions that uses the . py extension. For example: Popular
built-in Python modules include os, sys, math, random, and so on.

A Python library is a set of related modules or packages bundled together. It is used by the
programmers as well as the developers. For example: Popular built-in Python libraries include
Pygame, Pytorch, matplotlib, and more.

Real-world programs are complicated. Even simple software contains thousands of lines of
code. Because of this, writing code in continuous flow is difficult for programmers and
developers to grasp. Developers utilize modular programming to facilitate learning and make it
logically separated. It is a method of breaking down huge coding tasks into shorter, more logical,
and more adaptable subtasks.

Python's ease of use is one of its primary goals. Python has so many modules and libraries
because of this.

NumPy
NumPy is a Python library (package) used for working with arrays.

It also has functions for working in domain of linear algebra, fourier transform, and matrices.

NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it
freely.

NumPy stands for Numerical Python.

Supports N-dimensional array objects that can be used for processing multidimensional data

Supports different data-types

Why Use NumPy?

In Python we have lists that serve the purpose of arrays, but they are slow to process.

NumPy aims to provide an array object that is up to 50x faster than traditional Python lists.

The array object in NumPy is called ndarray, it provides a lot of supporting functions that make
working with ndarray very easy.

Arrays are very frequently used in data science, where speed and resources are very important.
Installation of NumPy

pip install numpy

Requirement already satisfied: numpy in c:\users\gargs\anaconda3\lib\


site-packages (1.22.4)
Note: you may need to restart the kernel to use updated packages.

Import NumPy

import numpy as np

Create a NumPy ndarray Object

NumPy is used to work with arrays. The array object in NumPy is called ndarray.

We can create a NumPy ndarray object by using the array() function.

arr = np.array([1, 2, 3, 4, 5])

print(arr)

print(type(arr))

[1 2 3 4 5]
<class 'numpy.ndarray'>

Checking NumPy Version

print(np.__version__)

1.22.4

arr = np.array((1, 2, 3, 4, 5))

print(arr)

print(type(arr))

[1 2 3 4 5]
<class 'numpy.ndarray'>

time

%time for i in range(1,100000): i*1000

CPU times: total: 15.6 ms


Wall time: 15.5 ms
for i in range(1,100000):
j=i*1000
j

import time
a = time.time()
for i in range(1,100000): i*1000
b = time.time()
c=b-a
print(a)
print(b)
print(c)

1704103059.5630674
1704103059.5816789
0.018611431121826172

import time

l = []
a = time.time()
for i in range(0, 100):
l.append(i)

b = time.time()

c = b - a

print("End time \n", b)


print("Start time \n", a)
print("Execution time \n", c)

End time
1703847721.5496702
Start time
1703847721.5496702
Execution time
0.0

Dimensions in Arrays

0-D Arrays

0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array.

arr = np.array(42)

print(arr)

42
1-D Arrays

An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.

These are the most common and basic arrays.

arr = np.array([1, 2, 3, 4, 5])

print(arr)

[1 2 3 4 5]

2-D Arrays

An array that has 1-D arrays as its elements is called a 2-D array.

These are often used to represent matrix or 2nd order tensors.

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr)

[[1 2 3]
[4 5 6]]

3-D arrays

An array that has 2-D arrays (matrices) as its elements is called 3-D array.

These are often used to represent a 3rd order tensor.

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(arr)

print(type(arr))
# print(len(arr))
# print(arr.ndim)
# print(arr.shape)

[[[1 2 3]
[4 5 6]]

[[1 2 3]
[4 5 6]]]
<class 'numpy.ndarray'>

Check Number of Dimensions?

NumPy Arrays provides the ndim attribute that returns an integer that tells us how many
dimensions the array have.
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)

0
1
2
3

Higher Dimensional Arrays

An array can have any number of dimensions.

arr = np.array([1, 2, 3, 4], ndmin=10)

print(arr)
print('number of dimensions :', arr.ndim)

[[[[[[[[[[1 2 3 4]]]]]]]]]]
number of dimensions : 10

NumPy Array Shape

Shape of an Array

The shape of an array is the number of elements in each dimension.

Get the Shape of an Array

NumPy arrays have an attribute called shape that returns a tuple with each index having the
number of corresponding elements.

arr = np.array([[1, 2, 3, 4], [5, 6, 7,8], [1,1,1,1]])

print(arr.shape)

(3, 4)

arr = np.array([1, 2, 3, 4], ndmin=0)

print(arr)
print('shape of array :', arr.shape)

[1 2 3 4]
shape of array : (4,)
arr = np.array([1, 2, 3, 4], ndmin=1)

print(arr)
print('shape of array :', arr.shape)

[1 2 3 4]
shape of array : (4,)

arr = np.array([1, 2, 3, 4], ndmin=2)

print(arr)
print('shape of array :', arr.shape)

[[1 2 3 4]]
shape of array : (1, 4)

arr = np.array([1, 2, 3, 4], ndmin=3)

print(arr)
print('shape of array :', arr.shape)

[[[1 2 3 4]]]
shape of array : (1, 1, 4)

arr = np.array([1, 2, 3, 4], ndmin=5)

print(arr)
print('shape of array :', arr.shape)

[[[[[1 2 3 4]]]]]
shape of array : (1, 1, 1, 1, 4)

NumPy Array Indexing

Access Array Elements

Array indexing is the same as accessing an array element.

You can access an array element by referring to its index number.

The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the
second has index 1 etc.

arr = np.array([1, 2, 3, 4])

print(arr[0])
print(arr[1])
print(arr[2] + arr[3])

1
2
7
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])

print('2nd element on 1st row: ', arr[0, 1],arr[0][1])


print('5th element on 2nd row: ', arr[1, 4])
print('Last element from 2nd dim: ', arr[1, -1])

2nd element on 1st row: 2 2


5th element on 2nd row: 10
Last element from 2nd dim: 10

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

print(arr[0, 1, 2])

NumPy Array Slicing

Slicing arrays

Slicing in python means taking elements from one given index to another given index.

We pass slice instead of index like this: [start:end].

We can also define the step, like this: [start:end:step].

If we don't pass start its considered 0

If we don't pass end its considered length of array in that dimension

If we don't pass step its considered 1

arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[1:5])
print(arr[4:])
print(arr[:4])
print(arr[-3:-1])
print(arr[1:5:2])
print(arr[::2])

[2 3 4 5]
[5 6 7]
[1 2 3 4]
[5 6]
[2 4]
[1 3 5 7]

arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])

print(arr[1, 1:4])
print(arr[0:2, 2])
print(arr[0:2, 1:4])

[7 8 9]
[3 8]
[[2 3 4]
[7 8 9]]

NumPy Data Types

Data Types in Python

By default Python have these data types:

strings - used to represent text data, the text is given under quote
marks. e.g. "ABCD"

integer - used to represent integer numbers. e.g. -1, -2, -3

float - used to represent real numbers. e.g. 1.2, 42.42

boolean - used to represent True or False.

complex - used to represent complex numbers. e.g. 1.0 + 2.0j, 1.5 +


2.5j

Data Types in NumPy

NumPy has some extra data types, and refer to data types with one character, like i for integers,
u for unsigned integers etc.

Below is a list of all data types in NumPy and the characters used to represent them.

i - integer

b - boolean

u - unsigned integer

f - float

c - complex float

m - timedelta

M - datetime

O - object

S - string
U - unicode string

V - fixed chunk of memory for other type ( void )

Checking the Data Type of an Array

The NumPy array object has a property called dtype that returns the data type of the array:

arr = np.array([1, 2, 3, 4])

print(arr.dtype)

int32

arr = np.array(['apple', 'banana', 'cherry'])

print(arr.dtype)

<U6

arr = np.array([1, 2, 3, 4], dtype='S')

print(arr)
print(arr.dtype)

[b'1' b'2' b'3' b'4']


|S1

arr = np.array([1, 2, 3, 4], dtype='U')

print(arr)
print(arr.dtype)

['1' '2' '3' '4']


<U1

arr = np.array([-101, 2, 3, 4], dtype='i2')

print(arr)
print(arr.dtype)

[-101 2 3 4]
int16

arr = np.array([1, 2, 3, 4], dtype='i4')

print(arr)
print(arr.dtype)

[1 2 3 4]
int32
arr = np.array([1, 2, 3, 4], dtype='i8')

print(arr)
print(arr.dtype)

[1 2 3 4]
int64

arr = np.array([1, 2, 3, 4], dtype='i')

print(arr)
print(arr.dtype)

[1 2 3 4]
int32

1 bit can store either a 0 or 1

1 byte = 8 bits (in ASCII encoding, where ASCII: American Standard Code for Information
Interchange)

1 word = 2 bytes

Unsigned Integer types

It starts with u and it has 8, 16, 32, 64, and 128-bit. The minimum and maximum values are from
0 to 2ⁿ-1.

DATATYPE>---->MIN>---->MAX>---->LENGTH

u8>---->0>---->255>---->8-bit

u16>---->0>---->65535>---->16-bit

& so on.

Signed Integer types

It starts with i and it has 8, 16, 32, 64, and 128-bit. The minimum and maximum values are from -
(2ⁿ⁻¹) to 2ⁿ⁻¹-1.

DATATYPE>---->MIN>---->MAX>---->LENGTH

i8>---->-128>---->127>---->8-bit

i16>---->-32768>---->32767>---->16-bit

& so on.

Here are the 16 possible values of a four-bit unsigned int:

bits value

0000 0
0001 1

0010 2

0011 3

0100 4

0101 5

0110 6

0111 7

1000 8

1001 9

1010 10

1011 11

1100 12

1101 13

1110 14

1111 15

Here are the 16 possible values of a four-bit signed int:

bits value

0000 0

0001 1

0010 2

0011 3

0100 4

0101 5

0110 6

0111 7

1000 -8

1001 -7

1010 -6

1011 -5
1100 -4

1101 -3

1110 -2

1111 -1

To understand the differences between byte string and Unicode string, we first need to know
what “Encoding” and “Decoding” are.

To store the human-readable characters on computers, we need to encode them into bytes. In
contrast, we need to decode the bytes into human-readable characters for representation. Byte,
in computer science, indicates a unit of 0/1, commonly of length 8. So characters “Hi” are
actually stored as “01001000 01101001” on the computer, which consumes 2 bytes (16-bits).

The rule that defines the encoding process is called encoding schema, commonly used ones
include “ASCII”, “UTF-8”, etc.

“ASCII” converts each character into one byte. Since one byte consisted of 8 bits and each bit
contains 0/1. The total number of characters “ASCII” can represent is 2⁸=256.

However, 256 characters are obviously not enough for storing all the characters in the world. In
light of that, people designed Unicode in which each character will be encoded as a “code point”.
For instance, “H” will be represented as code point “U+0048”.

arr = np.array([True,False,1, 2, 3, 4], dtype='u1')

print(arr)
print(arr.dtype)

[1 0 1 2 3 4]
uint8

arr = np.array([1, 2, 3, 4], dtype='u8')

print(arr)
print(arr.dtype)

[1 2 3 4]
uint64

arr = np.array([1, 2, 3, 4], dtype='f')

print(arr)
print(arr.dtype)

[1. 2. 3. 4.]
float32

arr = np.array([1, 2, 3, 4], dtype='c')


print(arr)
print(arr.dtype)

[b'1' b'2' b'3' b'4']


|S1

arr = np.array([1, 2, 3, 4], dtype='m')

print(arr)
print(arr.dtype)

[1 2 3 4]
timedelta64

arr = np.array([1, 2, 3, 4], dtype='O')

print(arr)
print(arr.dtype)

[1 2 3 4]
object

arr = np.array(["1"," 2", "3", "4"], dtype='S')

print(arr)
print(arr.dtype)

[b'1' b' 2' b'3' b'4']


|S2

The Unicode standard describes how characters are represented by code points. A code point
value is an integer in the range 0 to 0x10FFFF

arr = np.array(["1"," 2", "3", "4"], dtype='U')

print(arr)
print(arr.dtype)

['1' ' 2' '3' '4']


<U2

arr = np.array(['a', '2', '3'], dtype='i')

----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
Input In [127], in <cell line: 1>()
----> 1 arr = np.array(['a', '2', '3'], dtype='i')

ValueError: invalid literal for int() with base 10: 'a'


Converting Data Type on Existing Arrays

The best way to change the data type of an existing array, is to make a copy of the array with the
astype() method.

The astype() function creates a copy of the array, and allows you to specify the data type as a
parameter.

The data type can be specified using a string, like 'f' for float, 'i' for integer etc. or you can use the
data type directly like float for float and int for integer.

arr = np.array([1.1, 2.1, 3.1])

newarr = arr.astype('i')

print(arr)
print(arr.dtype)
print(newarr)
print(newarr.dtype)

[1.1 2.1 3.1]


float64
[1 2 3]
int32

arr = np.array([1.1, 2.1, 3.1])

newarr = arr.astype(int)

print(newarr)
print(newarr.dtype)

[1 2 3]
int32

arr = np.array([1, 0, 3])

newarr = arr.astype(bool)

print(newarr)
print(newarr.dtype)

[ True False True]


bool

NumPy Array Copy vs View

The Difference Between Copy and View

The main difference between a copy and a view of an array is that the copy is a new array, and
the view is just a view of the original array.
The copy owns the data and any changes made to the copy will not affect original array, and any
changes made to the original array will not affect the copy.

The view does not own the data and any changes made to the view will affect the original array,
and any changes made to the original array will affect the view.

arr = np.array([1, 2, 3, 4, 5])


x = arr.copy()
arr[0] = 42

print(arr)
print(x)

[42 2 3 4 5]
[1 2 3 4 5]

arr = np.array([1, 2, 3, 4, 5])


x = arr.view()
arr[0] = 42

print(arr)
print(x)

[42 2 3 4 5]
[42 2 3 4 5]

import numpy as np
import copy

x = np.array([10, 11, 12, 13])


y=x
# Create views of x (shallow copies sharing data) in 2 different ways
x_view1 = x.view()
x_view2 = x[:] # Creates a view using a slice

# Create full copies of x (not sharing data) in 2 different ways


x_copy1 = x.copy()
x_copy2 = copy.copy(x) # Calls x.__copy__() which creates a full copy
of x
x_copy3 = copy.deepcopy(x)

# Change some array elements to see what happens


x[0] = 555 # Affects x, x_view1, and x_view2
x_view1[1] = 666 # Affects x, x_view1, and x_view2
x_view2[2] = 777 # Affects x, x_view1, and x_view2
x_copy1[0] = 888 # Affects only x_copy1
x_copy2[0] = 999 # Affects only x_copy2

print("x_value: ",x) # [555 666 777 13]


print("y_value: ",y)
print('View_1: ',x_view1) # [555 666 777 13]
print('View_2: ',x_view2) # [555 666 777 13]
print('Copy_1: ',x_copy1) # [888 11 12 13]
print('Copy_2: ',x_copy2) # [999 11 12 13]
print('Copy_3: ',x_copy3)
print("id of x_value: ", id(x))
print("id of y_value: ", id(y))
print("id of View_1: ", id(x_view1))
print("id of View_2: ", id(x_view2))
print("id of Copy_1: ", id(x_copy1))
print("id of Copy_2: ", id(x_copy2))
print("id of Copy_3: ", id(x_copy3))

x_value: [555 666 777 13]


y_value: [555 666 777 13]
View_1: [555 666 777 13]
View_2: [555 666 777 13]
Copy_1: [888 11 12 13]
Copy_2: [999 11 12 13]
Copy_3: [10 11 12 13]
id of x_value: 2460775794704
id of y_value: 2460775794704
id of View_1: 2460775485584
id of View_2: 2460775913200
id of Copy_1: 2460535984176
id of Copy_2: 2460774842896
id of Copy_3: 2460530812272

NumPy Array Reshaping

Reshaping arrays

Reshaping means changing the shape of an array.

The shape of an array is the number of elements in each dimension.

By reshaping we can add or remove dimensions or change number of elements in each


dimension.

# Reshape From 1-D to 2-D


arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

newarr = arr.reshape(4, 3)
print(arr)
print(newarr)

[ 1 2 3 4 5 6 7 8 9 10 11 12]
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

newarr = arr.reshape(2, 3, 2)
print(arr)
print(newarr)

[ 1 2 3 4 5 6 7 8 9 10 11 12]
[[[ 1 2]
[ 3 4]
[ 5 6]]

[[ 7 8]
[ 9 10]
[11 12]]]

# Can We Reshape Into any Shape?


arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

newarr = arr.reshape(3, 3)

print(newarr)

----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
Input In [149], in <cell line: 4>()
1 # Can We Reshape Into any Shape?
2 arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
----> 4 newarr = arr.reshape(3, 3)
6 print(newarr)

ValueError: cannot reshape array of size 8 into shape (3,3)

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

newarr = arr.reshape(2, 2, -1)

print(newarr)

[[[1 2]
[3 4]]

[[5 6]
[7 8]]]

# Flattening the arrays means converting a multidimensional array into


a 1D array.
arr = np.array([[1, 2, 3], [4, 5, 6]])

newarr = arr.reshape(-1)
print(newarr)

[1 2 3 4 5 6]

arr = np.array([1, 2, 3, 4], ndmin=5)

newarr = arr.reshape(-1)

print(newarr)

[1 2 3 4]

NumPy Array Iterating

Iterating Arrays

Iterating means going through elements one by one.

As we deal with multi-dimensional arrays in numpy, we can do this using basic for loop of
python.

If we iterate on a 1-D array it will go through each element one by one.

arr = np.array([1, 2, 3])

for x in arr:
print(x)

1
2
3

arr = np.array([[1, 2, 3], [4, 5, 6]])

for x in arr:
print(x)

[1 2 3]
[4 5 6]

arr = np.array([[1, 2, 3], [4, 5, 6]])

for x in arr:
for y in x:
print(y)

1
2
3
4
5
6

arr = np.array([1, 2, 3, 4], ndmin=3)


for x in arr:
print(x)

[[1 2 3 4]]

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

for x in arr:
print(x)

[[1 2 3]
[4 5 6]]
[[ 7 8 9]
[10 11 12]]

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

for x in arr:
for y in x:
for z in y:
print(z)

1
2
3
4
5
6
7
8
9
10
11
12

Enumerated Iteration Using ndenumerate()

arr = np.array([1, 2, 3])

for idx, x in np.ndenumerate(arr):


print(idx, x)

(0,) 1
(1,) 2
(2,) 3
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

for idx, x in np.ndenumerate(arr):


print(idx, x)

(0, 0) 1
(0, 1) 2
(0, 2) 3
(0, 3) 4
(1, 0) 5
(1, 1) 6
(1, 2) 7
(1, 3) 8

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])


for idx, x in np.ndenumerate(arr):
print(idx, x)

(0, 0, 0) 1
(0, 0, 1) 2
(0, 0, 2) 3
(0, 1, 0) 4
(0, 1, 1) 5
(0, 1, 2) 6
(1, 0, 0) 1
(1, 0, 1) 2
(1, 0, 2) 3
(1, 1, 0) 4
(1, 1, 1) 5
(1, 1, 2) 6

NumPy Joining Array

Joining NumPy Arrays

Joining means putting contents of two or more arrays in a single array.

In SQL we join tables based on a key, whereas in NumPy we join arrays by axes.

We pass a sequence of arrays that we want to join to the concatenate() function, along with the
axis. If axis is not explicitly passed, it is taken as 0.

import numpy as np
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr3 = np.concatenate((arr1, arr2),axis=0)


arr4 = np.concatenate((arr1, arr2))
print('arr3: ',arr3)
print('arr4: ',arr4)

arr3: [1 2 3 4 5 6]
arr4: [1 2 3 4 5 6]

arr1 = np.array([[1, 2], [3, 4]])

arr2 = np.array([[5, 6], [7, 8]])

arr3 = np.concatenate((arr1, arr2), axis=1) # Concatenate horizontaly


# arr3 = np.concatenate((arr1, arr2), axis=-1) # Concatenate
horizontaly
arr4 = np.concatenate((arr1, arr2)) # Concatenate vertically

print(arr3)
print(arr4)
print(arr3.ndim)
print(arr4.ndim)
print(arr3.shape)
print(arr4.shape)

[[1 2 5 6]
[3 4 7 8]]
[[1 2]
[3 4]
[5 6]
[7 8]]
2
2
(2, 4)
(4, 2)

arr1 = np.array([[1, 2], [3, 4]])

arr2 = np.array([[5, 6], [7, 8]])

arr = np.concatenate((arr1, arr2), axis=0) # Concatenate vertically

print(arr)

[[1 2]
[3 4]
[5 6]
[7 8]]

Joining Arrays Using Stack Functions

Stacking is same as concatenation, the only difference is that stacking is done along a new axis.

We can concatenate two 1-D arrays along the second axis which would result in putting them
one over the other, ie. stacking.
We pass a sequence of arrays that we want to join to the stack() method along with the axis. If
axis is not explicitly passed it is taken as 0.

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr3 = np.stack((arr1, arr2), axis=1)


arr4 = np.stack((arr1, arr2), axis=0)
arr5 = np.stack((arr1, arr2))
print(arr3)
print(arr4)
print(arr5)

[[1 4]
[2 5]
[3 6]]
[[1 2 3]
[4 5 6]]
[[1 2 3]
[4 5 6]]

arr1 = np.array([[1, 2], [3, 4]])

arr2 = np.array([[5, 6], [7, 8]])

arr3 = np.stack((arr1, arr2), axis=1)


arr4 = np.stack((arr1, arr2), axis=0)

print(arr3)
print(arr4)

[[[1 2]
[5 6]]

[[3 4]
[7 8]]]
[[[1 2]
[3 4]]

[[5 6]
[7 8]]]

# stacking more than two 2d arrays


x=np.array([[1,2,3],
[4,5,6]])
y=np.array([[7,8,9],
[10,11,12]])
z=np.array([[13,14,15],
[16,17,18]])
print(np.stack((x,y,z),axis=0)) # with axis=0 : Just stacking.
print(np.stack((x,y,z),axis=1)) # with axis =1 (row-wise stacking)
print(np.stack((x,y,z),axis=2)) # with axis =2 (column-wise stacking)

[[[ 1 2 3]
[ 4 5 6]]

[[ 7 8 9]
[10 11 12]]

[[13 14 15]
[16 17 18]]]
[[[ 1 2 3]
[ 7 8 9]
[13 14 15]]

[[ 4 5 6]
[10 11 12]
[16 17 18]]]
[[[ 1 7 13]
[ 2 8 14]
[ 3 9 15]]

[[ 4 10 16]
[ 5 11 17]
[ 6 12 18]]]

Stacking Along Rows

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.hstack((arr1, arr2))

print(arr)

[1 2 3 4 5 6]

Stacking Along Columns

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.vstack((arr1, arr2))

print(arr)
[[1 2 3]
[4 5 6]]

Stacking Along Height (depth)

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.dstack((arr1, arr2))

print(arr)

[[[1 4]
[2 5]
[3 6]]]

NumPy Splitting Array

Splitting NumPy Arrays

Splitting is reverse operation of Joining.

Joining merges multiple arrays into one and Splitting breaks one array into multiple.

We use array_split() for splitting arrays, we pass it the array we want to split and the number of
splits.

# Split the array in 3 parts:


arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 3)

print(newarr)

[array([1, 2]), array([3, 4]), array([5, 6])]

# Split the array in 4 parts:


arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 4)

print(newarr)

[array([1, 2]), array([3, 4]), array([5]), array([6])]

Note: We also have the method split() available but it will not adjust the elements when
elements are less in source array for splitting like in example above, array_split() worked
properly but split() would fail.

Split Into Arrays


The return value of the array_split() method is an array containing each of the split as an array.

If you split an array into 3 arrays, you can access them from the result just like any array element:

arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 3)
print(newarr)
print(newarr[0])
print(newarr[1])
print(newarr[2])

[array([1, 2]), array([3, 4]), array([5, 6])]


[1 2]
[3 4]
[5 6]

# Splitting 2-D Arrays into three 2-D arrays


arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])

newarr = np.array_split(arr, 3)

print(newarr)

[array([[1, 2],
[3, 4]]), array([[5, 6],
[7, 8]]), array([[ 9, 10],
[11, 12]])]

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13,
14, 15], [16, 17, 18]])

newarr = np.array_split(arr, 3)

print(newarr)

[array([[1, 2, 3],
[4, 5, 6]]), array([[ 7, 8, 9],
[10, 11, 12]]), array([[13, 14, 15],
[16, 17, 18]])]

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13,
14, 15], [16, 17, 18]])

newarr = np.array_split(arr, 3, axis=1)

print(newarr)

[array([[ 1],
[ 4],
[ 7],
[10],
[13],
[16]]), array([[ 2],
[ 5],
[ 8],
[11],
[14],
[17]]), array([[ 3],
[ 6],
[ 9],
[12],
[15],
[18]])]

NumPy Searching Arrays

Searching Arrays

You can search an array for a certain value, and return the indexes that get a match.

To search an array, use the where() method.

arr = np.array([1, 2, 3, 4, 5, 4, 4])

x = np.where(arr == 4)

print(x)
# Which means that the value 4 is present at index 3, 5, and 6.

(array([3, 5, 6], dtype=int64),)

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

x = np.where(arr%2 == 0)

print(x)

(array([1, 3, 5, 7], dtype=int64),)

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

x = np.where(arr%2 == 1)

print(x)

(array([0, 2, 4, 6], dtype=int64),)

NumPy Sorting Arrays

Sorting Arrays

Sorting means putting elements in an ordered sequence.


Ordered sequence is any sequence that has an order corresponding to elements, like numeric or
alphabetical, ascending or descending.

The NumPy ndarray object has a function called sort(), that will sort a specified array.

arr = np.array([3, 2, 0, 1])

print(np.sort(arr))

[0 1 2 3]

arr = np.array(['banana', 'cherry', 'apple'])

print(np.sort(arr))

['apple' 'banana' 'cherry']

import numpy as np
arr = np.array([2,1,'banana', 'cherry', 'apple',True,1.2,0.9])

print(np.sort(arr))

['0.9' '1' '1.2' '2' 'True' 'apple' 'banana' 'cherry']

arr = np.array([True, False, True])

print(np.sort(arr))

[False True True]

arr = np.array([[3, 2, 4], [5, 0, 1]])

print(np.sort(arr))

[[2 3 4]
[0 1 5]]

NumPy Filter Array

Filtering Arrays

Getting some elements out of an existing array and creating a new array out of them is called
filtering.

In NumPy, you filter an array using a boolean index list.

If the value at an index is True that element is contained in the filtered array, if the value at that
index is False that element is excluded from the filtered array.

arr = np.array([41, 42, 43, 44])

x = [True, False, True, False]


newarr = arr[x]

print(newarr)

[41 43]

# Create a filter array that will return only values higher than 42:
import numpy as np

arr = np.array([41, 42, 43, 44])

# Create an empty list


filter_arr = []

# go through each element in arr


for element in arr:
# if the element is higher than 42, set the value to True, otherwise
False:
if element > 42:
filter_arr.append(True)
else:
filter_arr.append(False)

newarr = arr[filter_arr]

print(filter_arr)
print(newarr)

[False, False, True, True]


[43 44]

# Create a filter array that will return only even elements from the
original array:
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])

# Create an empty list


filter_arr = []

# go through each element in arr


for element in arr:
# if the element is completely divisble by 2, set the value to True,
otherwise False
if element % 2 == 0:
filter_arr.append(True)
else:
filter_arr.append(False)

newarr = arr[filter_arr]
print(filter_arr)
print(newarr)

[False, True, False, True, False, True, False]


[2 4 6]

Creating Filter Directly From Array

# Create a filter array that will return only values higher than 42:
import numpy as np

arr = np.array([41, 42, 43, 44])

filter_arr = arr > 42

newarr = arr[filter_arr]

print(filter_arr)
print(newarr)

[False False True True]


[43 44]

# Create a filter array that will return only even elements from the
original array:
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])

filter_arr = arr % 2 == 0

newarr = arr[filter_arr]

print(filter_arr)
print(newarr)

[False True False True False True False]


[2 4 6]

Random Numbers in NumPy

NumPy offers the random module to work with random numbers.

# Generate a random integer from 0 to 100:


import numpy as np
from numpy import random

x = random.randint(100)
# y = np.randint(1)
print(x)
# print(y)

98

# Generate a random float from 0 to 1:


from numpy import random
# import numpy as np

x = random.rand()
# x=np.random.rand()
print(x)

0.9675639543128254

# Generate a random 5 float numbers from 0 to 1 and round off to three


decimal place
from numpy import random
import numpy as np

x = random.rand(5)
y = np.round(x,3)
print(x)
print(y)

[0.88399159 0.62656546 0.48782214 0.75112813 0.59344029]


[0.884 0.627 0.488 0.751 0.593]

# Generate a 1-D array containing 5 random integers from 0 to 100:

from numpy import random

x=random.randint(100, size=(5))
y=random.randint(100, size=5)
z=random.randint(100)

print(x)
print(y)
print(z)

[41 74 9 2 5]
[37 48 37 76 43]
29

# Generate a 2-D array with 3 rows, each row containing 5 random


integers from 0 to 100:

from numpy import random

x = random.randint(100, size=(3, 5))

print(x)
[[ 4 47 17 31 79]
[81 58 15 18 2]
[59 29 7 54 74]]

# Generate a 1-D array containing 5 random floats:

from numpy import random

x = random.rand(5)

print(x)

[0.13106716 0.20924771 0.53795267 0.20599453 0.05961915]

# Generate a 2-D array with 3 rows, each row containing 5 random


numbers:

from numpy import random

x = random.rand(3, 5)

print(x)

[[0.86105522 0.02246511 0.2513794 0.51794864 0.80217106]


[0.77495296 0.35925065 0.70696627 0.28581528 0.86089321]
[0.88521754 0.02632593 0.99769161 0.25976094 0.97015082]]

The choice() method allows you to generate a random value based on an array of values.

The choice() method takes an array as a parameter and randomly returns one of the values.

# Return one of the values in an array:

from numpy import random

x = random.choice([3, 5, 7, 9])

print(x)

# The choice() method also allows you to return an array of values.

# Add a size parameter to specify the shape of the array.

# Generate a 2-D array that consists of the values in the array


parameter (3, 5, 7, and 9):

from numpy import random

x = random.choice([3, 0.1, 7, 9], size=(3, 5))


print(x)

[[0.1 3. 3. 3. 0.1]
[3. 0.1 3. 0.1 0.1]
[7. 7. 3. 9. 3. ]]

x.astype(int)

array([[0, 3, 3, 3, 0],
[3, 0, 3, 0, 0],
[7, 7, 3, 9, 3]])

# Create a numpy array of even integers from 20 to 40 using various


functions (i) array (ii) linspace (iii) arange

ar1 = np.array([20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40])
print(ar1)

[20 22 24 26 28 30 32 34 36 38 40]

linspace allows you to specify the number of steps.

Syntax: np.linspace(start, stop, num, …)

Example: np.linspace(0, 20, 11)

Output: array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18., 20.])

arange allows you to specify the size of the steps.

Syntax: np.arange(start, stop, steps, …)

Example: np.arange(0, 20, 2)

Output: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])

np.linspace(0,20)

array([ 0. , 0.40816327, 0.81632653, 1.2244898 ,


1.63265306,
2.04081633, 2.44897959, 2.85714286, 3.26530612,
3.67346939,
4.08163265, 4.48979592, 4.89795918, 5.30612245,
5.71428571,
6.12244898, 6.53061224, 6.93877551, 7.34693878,
7.75510204,
8.16326531, 8.57142857, 8.97959184, 9.3877551 ,
9.79591837,
10.20408163, 10.6122449 , 11.02040816, 11.42857143,
11.83673469,
12.24489796, 12.65306122, 13.06122449, 13.46938776,
13.87755102,
14.28571429, 14.69387755, 15.10204082, 15.51020408,
15.91836735,
16.32653061, 16.73469388, 17.14285714, 17.55102041,
17.95918367,
18.36734694, 18.7755102 , 19.18367347, 19.59183673,
20. ])

np.linspace(0,20,4)

array([ 0. , 6.66666667, 13.33333333, 20. ])

np.arange(0, 20)

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,


16,
17, 18, 19])

np.arange(0, 20, 4)

array([ 0, 4, 8, 12, 16])

ar1 = np.linspace(20,40,11)
print(ar1)

[20. 22. 24. 26. 28. 30. 32. 34. 36. 38. 40.]

ar1 = np.arange(20,41,2)
print(ar1)

[20 22 24 26 28 30 32 34 36 38 40]

# Set the second element(index = 1) of ar1 to 0 and print the array.

ar1[1] = 0
print(ar1)

[20 0 24 26 28 30 32 34 36 38 40]

# Print the data type of the elements of ar1 using the ‘dtype’
command.

print(np.dtype(ar1[0]))
print(np.dtype(ar1[1]))
print(np.dtype(ar1[2]))
print(np.dtype(ar1[3]))
print(np.dtype(ar1[4]))
print(np.dtype(ar1[5]))

int32
int32
int32
int32
int32
int32

# Use the ‘any’ function to check if any of the elements in ar1 are
zero or not.

np.any(ar1)

True

np.all(ar1)

False

# Print the length and dimension of ar1 using built-in functions.

len(ar1)

11

np.ndim(ar1)

# Statistics using numpy:

# Print the largest and smallest value from ar1 using built-in
functions.

ar1.max()

40

ar1.min()

# Print mean and median of ar1 using ‘mean’ and ‘median’ functions
respectively.

np.mean(ar1)

28.0

np.median(ar1)

30.0

# Print standard deviation and variance of ar1 using built-in


functions.

np.std(ar1)

10.583005244258363
np.var(ar1)

112.0

# Print the sum and product of all elements of ar1 using the built-in
functions.

np.sum(ar1)

308

np.product(ar1)

# Other operations:

# Sort ar1 in ascending order using the ‘sort’ function.

np.sort(ar1)

array([ 0, 20, 24, 26, 28, 30, 32, 34, 36, 38, 40])

# import numpy
import numpy as np

a = [1, 2, 2, 4, 3, 6, 4, 8]
# using np.unique() method
b = np.unique(a)

arr = np.array([1, 2, 2, 4, 3, 6, 4, 8], ndmin=3)


arr1 = np.unique(arr)
print(b)
print(arr1)

[1 2 3 4 6 8]
[1 2 3 4 6 8]

import numpy as np
arr = np.array([1, 2, 2, 3, 4], ndmin=3)
arr

array([[[1, 2, 2, 3, 4]]])

a = [[10.2, 21.4, 3.6, 14.8], [1.0, 5.0, 10.0, 15.0]]

# using np.unique() method


b = np.unique(a)

print(b)

[ 1. 3.6 5. 10. 10.2 14.8 15. 21.4]


# Reverse and print the ar1 using the ‘flip’ function.

np.flip(ar1)

array([40, 38, 36, 34, 32, 30, 28, 26, 24, 0, 20])

# Count the number of non-zero elements in ar1 using ‘count_nonzero’


function.

np.count_nonzero(ar1)

10

# Vectorization

# Create and store a vector F = [32, 38, 40, 28, 56, 65, 70] which
contains temperatures(Fahrenheit). Print F.

F = np.array([32, 38, 40, 28, 56, 65, 70])


print(F)

[32 38 40 28 56 65 70]

# Use the formula C/5 = (F −32)/9 to create an array C of


temperatures(Centigrade) corresponding to F. Print C.
C = (5 * (F - 32)) // 9
print(C)

[ 0 3 4 -3 13 18 21]

# Square every element of F and print the array.

np.square(ar1)

array([ 400, 0, 576, 676, 784, 900, 1024, 1156, 1296, 1444,
1600])

# Find Sin, Cos and Tan of each element in F. Print the results.

# Convert each of the values to radians using built-in function


c1 = np.sin(np.radians(F))
c2 = np.cos(np.radians(F))
c3 = np.tan(np.radians(F))

print(c1, "\n")
print(c2, "\n")
print(c3, "\n")

[0.52991926 0.61566148 0.64278761 0.46947156 0.82903757 0.90630779


0.93969262]

[0.8480481 0.78801075 0.76604444 0.88294759 0.5591929 0.42261826


0.34202014]
[0.62486935 0.78128563 0.83909963 0.53170943 1.48256097 2.14450692
2.74747742]

# Operations on a vector using relational and logical operators:

# Print elements of array F which are greater than 50.

F>50

array([False, False, False, False, True, True, True])

F[F>50]

array([56, 65, 70])

# Print elements of array F which are divisible by both 8 and 4.

F[np.logical_and(F % 8 == 0, F % 4 == 0)]

array([32, 40, 56])

# Print elements of array F which are divisible by 8 or 4.

F[np.logical_or(F % 8 == 0, F % 4 == 0)]

array([32, 40, 28, 56])

# Print elements of array F which are not divisible by 4.

F[F % 4 != 0]

array([38, 65, 70])

# Create the matrices

# Using matrix function

a1 = np.matrix([[5, 0, 4], [2, 3, 2], [1, 2, 1]])


print(a1)

[[5 0 4]
[2 3 2]
[1 2 1]]

b1 = np.matrix([[0, 1, 2], [1, 2, 3], [3, 1, 1]])


print(b1)

[[0 1 2]
[1 2 3]
[3 1 1]]
c1 = np.matrix([[1, 2], [3, 4], [5, 6]])
print(c1)

[[1 2]
[3 4]
[5 6]]

# Using array function

a = np.array([[5, 0, 4], [2, 3, 2], [1, 2, 1]])


print(a)

[[5 0 4]
[2 3 2]
[1 2 1]]

b = np.array([[0, 1, 2], [1, 2, 3], [3, 1, 1]])


print(b)

[[0 1 2]
[1 2 3]
[3 1 1]]

c = np.array([[1, 2], [3, 4], [5, 6]])


print(c)

[[1 2]
[3 4]
[5 6]]

# Print A * B and A1 * B1.

print(a * b)

[[0 0 8]
[2 6 6]
[3 2 1]]

print(a1 * b1)

[[12 9 14]
[ 9 10 15]
[ 5 6 9]]

# Print the dimension of A, B and C using the ‘ndim’ attribute.

np.ndim(a)

np.ndim(b)

2
np.ndim(c)

# Print the number of rows and columns in A, B and C using the ‘shape’
attribute.

np.shape(a)

(3, 3)

type(np.shape(a))

tuple

print("Number of rows in a:", np.shape(a)[0])


print("Number of columns in a:", np.shape(a)[1])

Number of rows in a: 3
Number of columns in a: 3

print("Number of rows in b:", np.shape(b)[0])


print("Number of columns in b:", np.shape(b)[1])

Number of rows in b: 3
Number of columns in b: 3

print("Number of rows in c:", np.shape(c)[0])


print("Number of columns in c:", np.shape(c)[1])

Number of rows in c: 3
Number of columns in c: 2

# Print the transpose of A, B and C using the ‘transpose’ function.

np.transpose(a)

array([[5, 2, 1],
[0, 3, 2],
[4, 2, 1]])

np.transpose(b)

array([[0, 1, 3],
[1, 2, 1],
[2, 3, 1]])

np.transpose(c)

array([[1, 3, 5],
[2, 4, 6]])

# Print the diagonal of A and B using the ‘diagonal’ function.


np.diagonal(a)

array([5, 3, 1])

np.diagonal(b)

array([0, 2, 1])

# Print A + B and A − B.

print(a + b)

[[5 1 6]
[3 5 5]
[4 3 2]]

print(a - b)

[[ 5 -1 2]
[ 1 1 -1]
[-2 1 0]]

# Create an identity matrix of order 3 X 3 and verify that AI = A. Use


‘eye’ function.

i = np.eye(3, dtype=int)
print(i)

[[1 0 0]
[0 1 0]
[0 0 1]]

np.dot(a, i)

array([[5, 0, 4],
[2, 3, 2],
[1, 2, 1]])

# Print the matrix products AB, BA, AC and BC.

a * b

array([[0, 0, 8],
[2, 6, 6],
[3, 2, 1]])

b * a

array([[0, 0, 8],
[2, 6, 6],
[3, 2, 1]])

a * c
----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
Input In [311], in <cell line: 1>()
----> 1 a * c

ValueError: operands could not be broadcast together with shapes (3,3)


(3,2)

b * c

----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
Input In [312], in <cell line: 1>()
----> 1 b * c

ValueError: operands could not be broadcast together with shapes (3,3)


(3,2)

# Verify that matrix products are computed using ‘np.dot’ function.

np.dot(a, b)

array([[12, 9, 14],
[ 9, 10, 15],
[ 5, 6, 9]])

np.dot(b, a)

array([[ 4, 7, 4],
[12, 12, 11],
[18, 5, 15]])

np.dot(a, c)

array([[25, 34],
[21, 28],
[12, 16]])

np.dot(b, c)

array([[13, 16],
[22, 28],
[11, 16]])

# Print greatest and least elements of A, B and C using ‘min’ and


‘max’ functions respectively.
print("The minimum value in a is:", np.min(a))
print("The maximum value in a is", np.max(a))

The minimum value in a is: 0


The maximum value in a is 5

print("The minimum value in b is:", np.min(b))


print("The maximum value in b is:", np.max(b))

The minimum value in b is: 0


The maximum value in b is: 3

print("The minimum value in c is:", np.min(c))


print("The maximum value in c is:", np.max(c))

The minimum value in c is: 1


The maximum value in c is: 6

# Print the sum of all elements in matrices A, B and C using the ‘sum’
function.

np.sum(a)

20

np.sum(b)

14

np.sum(c)

21

# Print the traces of matrices A, B and C using the ‘trace’ function.

np.trace(a)

np.trace(b)

np.trace(c)

# Print the flattened one-dimensional array for A, B and C using the


‘flatten’ function.

a.flatten()

array([5, 0, 4, 2, 3, 2, 1, 2, 1])
b.flatten()

array([0, 1, 2, 1, 2, 3, 3, 1, 1])

c.flatten()

array([1, 2, 3, 4, 5, 6])

# Print the sum of rows and sum of columns for all matrices A, B and C
using the ‘sum’ function.
# [Hint: Use axis = 0 and axis = 1]

print("The sum of rows of a:", np.sum(a, axis=1))


print("The sum of coumns of a:", np.sum(a, axis=0))

The sum of rows of a: [9 7 4]


The sum of coumns of a: [8 5 7]

print("The sum of rows of b:", np.sum(b, axis=1))


print("The sum of coumns of b:", np.sum(b, axis=0))

The sum of rows of b: [3 6 5]


The sum of coumns of b: [4 4 6]

print("The sum of rows of c:", np.sum(a, axis=1))


print("The sum of coumns of c:", np.sum(a, axis=0))

The sum of rows of c: [9 7 4]


The sum of coumns of c: [8 5 7]

Pandas
Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was
created by Wes McKinney in 2008.

Why Use Pandas?

Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

Data Science: is a branch of computer science where we study how to store, use and analyze
data for deriving information from it.

What Can Pandas Do?


Pandas gives you answers about the data. Like:

Is there a correlation between two or more columns?

What is average value?

Max value?

Min value?

Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or
NULL values. This is called cleaning the data.

Installation of Pandas

pip install pandas

Requirement already satisfied: pandas in c:\users\gargs\anaconda3\lib\


site-packages (1.4.2)
Requirement already satisfied: python-dateutil>=2.8.1 in c:\users\
gargs\anaconda3\lib\site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in c:\users\gargs\
anaconda3\lib\site-packages (from pandas) (2021.3)
Requirement already satisfied: numpy>=1.18.5 in c:\users\gargs\
anaconda3\lib\site-packages (from pandas) (1.22.4)
Requirement already satisfied: six>=1.5 in c:\users\gargs\anaconda3\
lib\site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)
Note: you may need to restart the kernel to use updated packages.

import pandas as pd

mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
# mydataset
myvar = pd.DataFrame(mydataset)

print(myvar)

cars passings
0 BMW 3
1 Volvo 7
2 Ford 2

mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2],
'price':["x","10",20]
}
# mydataset
myvar = pd.DataFrame(mydataset)
print(myvar)

cars passings price


0 BMW 3 x
1 Volvo 7 10
2 Ford 2 20

Checking Pandas Version

print(pd.__version__)

1.4.2

What is a Pandas Series?

A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

a = [1, 7, 2]

myvar = pd.Series(a)

print(myvar)

0 1
1 7
2 2
dtype: int64

a = [1, 7, 2]

myvar = pd.Series(a)

myvar1=pd.DataFrame(myvar)
print(myvar)
print(myvar1)

0 1
1 7
2 2
dtype: int64
0
0 1
1 7
2 2

# Return the first value of the Series:


print(myvar[0])
1

print(myvar[-1])

----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
File ~\anaconda3\lib\site-packages\pandas\core\indexes\range.py:385,
in RangeIndex.get_loc(self, key, method, tolerance)
384 try:
--> 385 return self._range.index(new_key)
386 except ValueError as err:

ValueError: -1 is not in range

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call


last)
Input In [9], in <cell line: 1>()
----> 1 print(myvar[-1])

File ~\anaconda3\lib\site-packages\pandas\core\series.py:958, in
Series.__getitem__(self, key)
955 return self._values[key]
957 elif key_is_scalar:
--> 958 return self._get_value(key)
960 if is_hashable(key):
961 # Otherwise index.get_value will raise InvalidIndexError
962 try:
963 # For labels that don't resolve as scalars like tuples
and frozensets

File ~\anaconda3\lib\site-packages\pandas\core\series.py:1069, in
Series._get_value(self, label, takeable)
1066 return self._values[label]
1068 # Similar to Index.get_value, but we do not fall back to
positional
-> 1069 loc = self.index.get_loc(label)
1070 return self.index._get_values_for_loc(self, loc, label)

File ~\anaconda3\lib\site-packages\pandas\core\indexes\range.py:387,
in RangeIndex.get_loc(self, key, method, tolerance)
385 return self._range.index(new_key)
386 except ValueError as err:
--> 387 raise KeyError(key) from err
388 self._check_indexing_error(key)
389 raise KeyError(key)
KeyError: -1

print(myvar[2])

# Create your own labels:


import pandas as pd
a = [1, 7, 2]

myvar = pd.Series(a, index = ["x", "y", "z"])


myvar1 = pd.DataFrame(a, index = ["x", "y", "z"])

print(myvar)
print(myvar1)
print(myvar["y"])

x 1
y 7
z 2
dtype: int64
0
x 1
y 7
z 2
7

print(myvar[7])

----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
File ~\anaconda3\lib\site-packages\pandas\core\indexes\range.py:385,
in RangeIndex.get_loc(self, key, method, tolerance)
384 try:
--> 385 return self._range.index(new_key)
386 except ValueError as err:

ValueError: 7 is not in range

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call


last)
Input In [17], in <cell line: 1>()
----> 1 print(myvar[7])

File ~\anaconda3\lib\site-packages\pandas\core\series.py:958, in
Series.__getitem__(self, key)
955 return self._values[key]
957 elif key_is_scalar:
--> 958 return self._get_value(key)
960 if is_hashable(key):
961 # Otherwise index.get_value will raise InvalidIndexError
962 try:
963 # For labels that don't resolve as scalars like tuples
and frozensets

File ~\anaconda3\lib\site-packages\pandas\core\series.py:1069, in
Series._get_value(self, label, takeable)
1066 return self._values[label]
1068 # Similar to Index.get_value, but we do not fall back to
positional
-> 1069 loc = self.index.get_loc(label)
1070 return self.index._get_values_for_loc(self, loc, label)

File ~\anaconda3\lib\site-packages\pandas\core\indexes\range.py:387,
in RangeIndex.get_loc(self, key, method, tolerance)
385 return self._range.index(new_key)
386 except ValueError as err:
--> 387 raise KeyError(key) from err
388 self._check_indexing_error(key)
389 raise KeyError(key)

KeyError: 7

# Create your own labels:


import pandas as pd
import numpy as np
b = [[1, 7, 2],[1, 3, 9]]
# b=np.array([[1,2,3],[4,5,6]])
print(b)
myvar = pd.Series(b) #, index = ["x", "y", "z"])
myvar1 = pd.DataFrame(b) #, index = ["x", "y", "z"])

print(myvar)
print(myvar1)

[[1, 7, 2], [1, 3, 9]]


0 [1, 7, 2]
1 [1, 3, 9]
dtype: object
0 1 2
0 1 7 2
1 1 3 9

# Create a simple Pandas Series from a dictionary:


calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories)

print(myvar)

day1 420
day2 380
day3 390
dtype: int64

# Create a simple Pandas Series from a dictionary:


calories = {"day1": [420,220], "day2": [380,300], "day3": [390,3000]}

myvar = pd.Series(calories)

print(myvar)

day1 [420, 220]


day2 [380, 300]
day3 [390, 3000]
dtype: object

# Create a Series using only data from "day1" and "day2":


calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories, index = ["day1", "day2"])

print(myvar)

day1 420
day2 380
dtype: int64

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories, index = [1,2])

print(myvar)

1 NaN
2 NaN
dtype: float64

What is a DataFrame?

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with
rows and columns.

# Create a simple Pandas DataFrame from two series:


data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
# data = {"day1": 420, "day2": 380, "day3": 390}

#load data into a DataFrame object:


df = pd.DataFrame(data)

print(df)

calories duration
0 420 50
1 380 40
2 390 45

Locate Row

As you can see from the result above, the DataFrame is like a table with rows and columns.

Pandas use the loc attribute to return one or more specified row(s)

# Return row 0:
print(df.loc[0])

calories 420
duration 50
Name: 0, dtype: int64

print(df.loc[-1])

----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
File ~\anaconda3\lib\site-packages\pandas\core\indexes\range.py:385,
in RangeIndex.get_loc(self, key, method, tolerance)
384 try:
--> 385 return self._range.index(new_key)
386 except ValueError as err:

ValueError: -1 is not in range

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call


last)
Input In [25], in <cell line: 1>()
----> 1 print(df.loc[-1])

File ~\anaconda3\lib\site-packages\pandas\core\indexing.py:967, in
_LocationIndexer.__getitem__(self, key)
964 axis = self.axis or 0
966 maybe_callable = com.apply_if_callable(key, self.obj)
--> 967 return self._getitem_axis(maybe_callable, axis=axis)

File ~\anaconda3\lib\site-packages\pandas\core\indexing.py:1202, in
_LocIndexer._getitem_axis(self, key, axis)
1200 # fall thru to straight lookup
1201 self._validate_key(key, axis)
-> 1202 return self._get_label(key, axis=axis)

File ~\anaconda3\lib\site-packages\pandas\core\indexing.py:1153, in
_LocIndexer._get_label(self, label, axis)
1151 def _get_label(self, label, axis: int):
1152 # GH#5667 this will fail if the label is not present in
the axis.
-> 1153 return self.obj.xs(label, axis=axis)

File ~\anaconda3\lib\site-packages\pandas\core\generic.py:3864, in
NDFrame.xs(self, key, axis, level, drop_level)
3862 new_index = index[loc]
3863 else:
-> 3864 loc = index.get_loc(key)
3866 if isinstance(loc, np.ndarray):
3867 if loc.dtype == np.bool_:

File ~\anaconda3\lib\site-packages\pandas\core\indexes\range.py:387,
in RangeIndex.get_loc(self, key, method, tolerance)
385 return self._range.index(new_key)
386 except ValueError as err:
--> 387 raise KeyError(key) from err
388 self._check_indexing_error(key)
389 raise KeyError(key)

KeyError: -1

print(df.loc[2])

calories 390
duration 45
Name: 2, dtype: int64

df

calories duration
0 420 50
1 380 40
2 390 45

# Return row 0 and 1:


print(df.loc[[0, 1]])
calories duration
0 420 50
1 380 40

Note: When using [], the result is a Pandas DataFrame.

Named Indexes

With the index argument, you can name your own indexes.

# Add a list of names to give each row a name:


data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

print(df)
df

calories duration
day1 420 50
day2 380 40
day3 390 45

calories duration
day1 420 50
day2 380 40
day3 390 45

Locate Named Indexes

Use the named index in the loc attribute to return the specified row(s).

# Return "day2":
print(df.loc["day2"])

calories 380
duration 40
Name: day2, dtype: int64

# Return "calories":
print(df.loc["calories"])

----------------------------------------------------------------------
-----
KeyError Traceback (most recent call
last)
File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3621,
in Index.get_loc(self, key, method, tolerance)
3620 try:
-> 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:

File ~\anaconda3\lib\site-packages\pandas\_libs\index.pyx:136, in
pandas._libs.index.IndexEngine.get_loc()

File ~\anaconda3\lib\site-packages\pandas\_libs\index.pyx:163, in
pandas._libs.index.IndexEngine.get_loc()

File pandas\_libs\hashtable_class_helper.pxi:5198, in
pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas\_libs\hashtable_class_helper.pxi:5206, in
pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'calories'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call


last)
Input In [31], in <cell line: 2>()
1 # Return "calories":
----> 2 print(df.loc["calories"])

File ~\anaconda3\lib\site-packages\pandas\core\indexing.py:967, in
_LocationIndexer.__getitem__(self, key)
964 axis = self.axis or 0
966 maybe_callable = com.apply_if_callable(key, self.obj)
--> 967 return self._getitem_axis(maybe_callable, axis=axis)

File ~\anaconda3\lib\site-packages\pandas\core\indexing.py:1202, in
_LocIndexer._getitem_axis(self, key, axis)
1200 # fall thru to straight lookup
1201 self._validate_key(key, axis)
-> 1202 return self._get_label(key, axis=axis)

File ~\anaconda3\lib\site-packages\pandas\core\indexing.py:1153, in
_LocIndexer._get_label(self, label, axis)
1151 def _get_label(self, label, axis: int):
1152 # GH#5667 this will fail if the label is not present in
the axis.
-> 1153 return self.obj.xs(label, axis=axis)

File ~\anaconda3\lib\site-packages\pandas\core\generic.py:3864, in
NDFrame.xs(self, key, axis, level, drop_level)
3862 new_index = index[loc]
3863 else:
-> 3864 loc = index.get_loc(key)
3866 if isinstance(loc, np.ndarray):
3867 if loc.dtype == np.bool_:

File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3623,
in Index.get_loc(self, key, method, tolerance)
3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
-> 3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will
raise
3626 # InvalidIndexError. Otherwise we fall through and re-
raise
3627 # the TypeError.
3628 self._check_indexing_error(key)

KeyError: 'calories'

# Import os to setup the working directory.. Here os means operating


system
import os

# Setting the working the directory


# os.chdir("C:/Users/gargs/Downloads/myPythonwork")
os.chdir("E:\\Study-work\\Galgotias_University\\2023\\Sep-Dec\\
Python\\BSc_MSc_2023\\B Sc")

Load Files Into a DataFrame

If your data sets are stored in a file, Pandas can load them into a DataFrame.

Read CSV Files

A simple way to store big data sets is to use CSV files (comma separated files).

CSV files contains plain text and is a well known format that can be read by everyone including
Pandas.

In our examples we will be using a CSV file called 'data.csv'.

# Load a comma separated file (CSV file) into a DataFrame:


import pandas as pd

df = pd.read_csv('data.csv')

print(df.to_string())

Duration Pulse Maxpulse Calories


0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
5 60 102 127 300.5
6 60 110 136 374.0
7 45 104 134 253.3
8 30 109 133 195.1
9 60 98 124 269.0
10 60 103 147 329.3
11 60 100 120 250.7
12 60 106 128 345.3
13 60 104 132 379.3
14 60 98 123 275.0
15 60 98 120 215.2
16 60 100 120 300.0
17 45 90 112 NaN
18 60 103 123 323.0
19 45 97 125 243.0
20 60 108 131 364.2
21 45 100 119 282.0
22 60 130 101 300.0
23 45 105 132 246.0
24 60 102 126 334.5
25 60 100 120 250.0
26 60 92 118 241.0
27 60 103 132 NaN
28 60 100 132 280.0
29 60 102 129 380.3
30 60 92 115 243.0
31 45 90 112 180.1
32 60 101 124 299.0
33 60 93 113 223.0
34 60 107 136 361.0
35 60 114 140 415.0
36 60 102 127 300.5
37 60 100 120 300.1
38 60 100 120 300.0
39 45 104 129 266.0
40 45 90 112 180.1
41 60 98 126 286.0
42 60 100 122 329.4
43 60 111 138 400.0
44 60 111 131 397.0
45 60 99 119 273.0
46 60 109 153 387.6
47 45 111 136 300.0
48 45 108 129 298.0
49 60 111 139 397.6
50 60 107 136 380.2
51 80 123 146 643.1
52 60 106 130 263.0
53 60 118 151 486.0
54 30 136 175 238.0
55 60 121 146 450.7
56 60 118 121 413.0
57 45 115 144 305.0
58 20 153 172 226.4
59 45 123 152 321.0
60 210 108 160 1376.0
61 160 110 137 1034.4
62 160 109 135 853.0
63 45 118 141 341.0
64 20 110 130 131.4
65 180 90 130 800.4
66 150 105 135 873.4
67 150 107 130 816.0
68 20 106 136 110.4
69 300 108 143 1500.2
70 150 97 129 1115.0
71 60 109 153 387.6
72 90 100 127 700.0
73 150 97 127 953.2
74 45 114 146 304.0
75 90 98 125 563.2
76 45 105 134 251.0
77 45 110 141 300.0
78 120 100 130 500.4
79 270 100 131 1729.0
80 30 159 182 319.2
81 45 149 169 344.0
82 30 103 139 151.1
83 120 100 130 500.0
84 45 100 120 225.3
85 30 151 170 300.1
86 45 102 136 234.0
87 120 100 157 1000.1
88 45 129 103 242.0
89 20 83 107 50.3
90 180 101 127 600.1
91 45 107 137 NaN
92 30 90 107 105.3
93 15 80 100 50.5
94 20 150 171 127.4
95 20 151 168 229.4
96 30 95 128 128.2
97 25 152 168 244.2
98 30 109 131 188.2
99 90 93 124 604.1
100 20 95 112 77.7
101 90 90 110 500.0
102 90 90 100 500.0
103 90 90 100 500.4
104 30 92 108 92.7
105 30 93 128 124.0
106 180 90 120 800.3
107 30 90 120 86.2
108 90 90 120 500.3
109 210 137 184 1860.4
110 60 102 124 325.2
111 45 107 124 275.0
112 15 124 139 124.2
113 45 100 120 225.3
114 60 108 131 367.6
115 60 108 151 351.7
116 60 116 141 443.0
117 60 97 122 277.4
118 60 105 125 NaN
119 60 103 124 332.7
120 30 112 137 193.9
121 45 100 120 100.7
122 60 119 169 336.7
123 60 107 127 344.9
124 60 111 151 368.5
125 60 98 122 271.0
126 60 97 124 275.3
127 60 109 127 382.0
128 90 99 125 466.4
129 60 114 151 384.0
130 60 104 134 342.5
131 60 107 138 357.5
132 60 103 133 335.0
133 60 106 132 327.5
134 60 103 136 339.0
135 20 136 156 189.0
136 45 117 143 317.7
137 45 115 137 318.0
138 45 113 138 308.0
139 20 141 162 222.4
140 60 108 135 390.0
141 60 97 127 NaN
142 45 100 120 250.4
143 45 122 149 335.4
144 60 136 170 470.2
145 45 106 126 270.8
146 60 107 136 400.0
147 60 112 146 361.9
148 30 103 127 185.0
149 60 110 150 409.4
150 60 106 134 343.0
151 60 109 129 353.2
152 60 109 138 374.0
153 30 150 167 275.8
154 60 105 128 328.0
155 60 111 151 368.5
156 60 97 131 270.4
157 60 100 120 270.4
158 60 114 150 382.8
159 30 80 120 240.9
160 30 85 120 250.4
161 45 90 130 260.4
162 45 95 130 270.0
163 45 100 140 280.9
164 60 105 140 290.8
165 60 110 145 300.4
166 60 115 145 310.2
167 75 120 150 320.4
168 75 125 150 330.4

Note: use to_string() to print the entire DataFrame.

If you have a large DataFrame with many rows, Pandas will only return the first 5 rows, and the
last 5 rows:

df = pd.read_csv('data.csv')

print(df)

Duration Pulse Maxpulse Calories


0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
.. ... ... ... ...
164 60 105 140 290.8
165 60 110 145 300.4
166 60 115 145 310.2
167 75 120 150 320.4
168 75 125 150 330.4

[169 rows x 4 columns]

To find out what parameters a method takes and what data format it supports, we can type
method name followed by a question mark (?)

import pandas as pd
pd.read_csv? # Run yourself
To find the type of dataframe

type(df)

pandas.core.frame.DataFrame

max_rows

The number of rows returned is defined in Pandas option settings.

You can check your system's maximum rows with the pd.options.display.max_rows statement.

# Check the number of maximum returned rows:


print(pd.options.display.max_rows)

60

# Increase the maximum number of rows to display the entire DataFrame:


# pd.options.display.max_rows = 168 # 9999

df = pd.read_csv('data.csv')

print(df)

Duration Pulse Maxpulse Calories


0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
.. ... ... ... ...
164 60 105 140 290.8
165 60 110 145 300.4
166 60 115 145 310.2
167 75 120 150 320.4
168 75 125 150 330.4

[169 rows x 4 columns]

# Load a Python Dictionary into a DataFrame:


data = {
"Duration":{
"0":60,
"1":60,
"2":60,
"3":45,
"4":45,
"5":60
},
"Pulse":{
"0":110,
"1":117,
"2":103,
"3":109,
"4":117,
"5":102
},
"Maxpulse":{
"0":130,
"1":145,
"2":135,
"3":175,
"4":148,
"5":127
},
"Calories":{
"0":409,
"1":479,
"2":340,
"3":282,
"4":406,
"5":300
}
}

df = pd.DataFrame(data)

print(df)

Duration Pulse Maxpulse Calories


0 60 110 130 409
1 60 117 145 479
2 60 103 135 340
3 45 109 175 282
4 45 117 148 406
5 60 102 127 300

data = {
"Duration":[60,60,60,45,45,60],
"Pulse":[110,117,103,109,117,102],
"Maxpulse":[130,145,135,175,148,127],
"Calories":[409,479,340,282,406,300]}
# df = pd.DataFrame(data, index = [0,1,2,3,4,5])
df = pd.DataFrame(data)

print(df)

Duration Pulse Maxpulse Calories


0 60 110 130 409
1 60 117 145 479
2 60 103 135 340
3 45 109 175 282
4 45 117 148 406
5 60 102 127 300

Pandas - Analyzing DataFrames

Viewing the Data

One of the most used method for getting a quick overview of the DataFrame, is the head()
method.

The head() method returns the headers and a specified number of rows, starting from the top.

# Get a quick overview by printing the first 10 rows of the DataFrame:


import pandas as pd

df = pd.read_csv('data.csv')

print(df.head(10))

Duration Pulse Maxpulse Calories


0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
5 60 102 127 300.5
6 60 110 136 374.0
7 45 104 134 253.3
8 30 109 133 195.1
9 60 98 124 269.0

# Print the first 5 rows of the DataFrame:


import pandas as pd

df = pd.read_csv('data.csv')

print(df.head())

Duration Pulse Maxpulse Calories


0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0

There is also a tail() method for viewing the last rows of the DataFrame.

The tail() method returns the headers and a specified number of rows, starting from the bottom.
# Print the last 5 rows of the DataFrame:
print(df.tail())

Duration Pulse Maxpulse Calories


164 60 105 140 290.8
165 60 110 145 300.4
166 60 115 145 310.2
167 75 120 150 320.4
168 75 125 150 330.4

print(df.tail(10))

Duration Pulse Maxpulse Calories


159 30 80 120 240.9
160 30 85 120 250.4
161 45 90 130 260.4
162 45 95 130 270.0
163 45 100 140 280.9
164 60 105 140 290.8
165 60 110 145 300.4
166 60 115 145 310.2
167 75 120 150 320.4
168 75 125 150 330.4

type(df)

pandas.core.frame.DataFrame

df.dtypes

Duration int64
Pulse int64
Maxpulse int64
Calories float64
dtype: object

df.shape

(169, 4)

df.columns

Index(['Duration', 'Pulse', 'Maxpulse', 'Calories'], dtype='object')

list(df.columns)

['Duration', 'Pulse', 'Maxpulse', 'Calories']

df.head(5)

Duration Pulse Maxpulse Calories


0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0

df.head(5).transpose()

0 1 2 3 4
Duration 60.0 60.0 60.0 45.0 45.0
Pulse 110.0 117.0 103.0 109.0 117.0
Maxpulse 130.0 145.0 135.0 175.0 148.0
Calories 409.1 479.0 340.0 282.4 406.0

Slicing

df

Duration Pulse Maxpulse Calories


0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
.. ... ... ... ...
164 60 105 140 290.8
165 60 110 145 300.4
166 60 115 145 310.2
167 75 120 150 320.4
168 75 125 150 330.4

[169 rows x 4 columns]

df[0:5]

Duration Pulse Maxpulse Calories


0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0

df["Calories"][0:5]

0 409.1
1 479.0
2 340.0
3 282.4
4 406.0
Name: Calories, dtype: float64

df[["Pulse","Calories"]][0:5]
Pulse Calories
0 110 409.1
1 117 479.0
2 103 340.0
3 109 282.4
4 117 406.0

df

Duration Pulse Maxpulse Calories


0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
.. ... ... ... ...
164 60 105 140 290.8
165 60 110 145 300.4
166 60 115 145 310.2
167 75 120 150 320.4
168 75 125 150 330.4

[169 rows x 4 columns]

df.iloc[1:5,1:3]

Pulse Maxpulse
1 117 145
2 103 135
3 109 175
4 117 148

df.iloc[:,1:5]

Pulse Maxpulse Calories


0 110 130 409.1
1 117 145 479.0
2 103 135 340.0
3 109 175 282.4
4 117 148 406.0
.. ... ... ...
164 105 140 290.8
165 110 145 300.4
166 115 145 310.2
167 120 150 320.4
168 125 150 330.4

[169 rows x 3 columns]

df.iloc[1:5]
Duration Pulse Maxpulse Calories
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0

Info About the Data

The DataFrames object has a method called info(), that gives you more information about the
data set.

# Print information about the data:


print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 169 entries, 0 to 168
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Duration 169 non-null int64
1 Pulse 169 non-null int64
2 Maxpulse 169 non-null int64
3 Calories 164 non-null float64
dtypes: float64(1), int64(3)
memory usage: 5.4 KB
None

Null Values

The info() method also tells us how many Non-Null values there are present in each column, and
in our data set it seems like there are 164 of 169 Non-Null values in the "Calories" column.

Which means that there are 5 rows with no value at all, in the "Calories" column, for whatever
reason.

Empty values, or Null values, can be bad when analyzing data, and you should consider
removing rows with empty values. This is a step towards what is called cleaning data, and you
will learn more about that in the next chapters.

Pandas - Cleaning Data

Data cleaning means fixing bad data in your data set.

Bad data could be:

Empty cells

Data in wrong format

Wrong data
Duplicates

Pandas - Cleaning Empty Cells

Empty cells can potentially give you a wrong result when you analyze data.

First of all, we need to check if there are any missing data and NaN values in each column.

# Check if there are any missing data and NaN values in each column.
import pandas as pd

df = pd.read_csv('data1.csv')

df.isnull().values.any()

df['Calories'].isnull().values.any()

True

df.isnull()

Duration Date Pulse Maxpulse Calories


0 False False False False False
1 False False False False False
2 False False False False False
3 False False False False False
4 False False False False False
5 False False False False False
6 False False False False False
7 False False False False False
8 False False False False False
9 False False False False False
10 False False False False False
11 False False False False False
12 False False False False False
13 False False False False False
14 False False False False False
15 False False False False False
16 False False False False False
17 False False False False False
18 False False False False True
19 False False False False False
20 False False False False False
21 False False False False False
22 False True False False False
23 False False False False False
24 False False False False False
25 False False False False False
26 False False False False False
27 False False False False False
28 False False False False True
29 False False False False False
30 False False False False False
31 False False False False False

df['Maxpulse'].isnull().values.any()

False

df['Pulse'].isnull().values.any()

False

df['Date'].isnull().values.any()

True

Use describe function get summary statistics for the df.

df.describe()

Duration Pulse Maxpulse Calories


count 32.000000 32.000000 32.000000 30.000000
mean 68.437500 103.500000 128.500000 304.680000
std 70.039591 7.832933 12.998759 66.003779
min 30.000000 90.000000 101.000000 195.100000
25% 60.000000 100.000000 120.000000 250.700000
50% 60.000000 102.500000 127.500000 291.200000
75% 60.000000 106.500000 132.250000 343.975000
max 450.000000 130.000000 175.000000 479.000000

Remove Rows

One way to deal with empty cells is to remove rows that contain empty cells.

This is usually OK, since data sets can be very big, and removing a few rows will not have a big
impact on the result.

# Return a new Data Frame with no empty cells:


import pandas as pd

df = pd.read_csv('data1.csv')

new_df = df.dropna()

print(new_df.to_string())

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

Note: By default, the dropna() method returns a new DataFrame, and will not change the
original.

print(new_df.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 29 entries, 0 to 31
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Duration 29 non-null int64
1 Date 29 non-null object
2 Pulse 29 non-null int64
3 Maxpulse 29 non-null int64
4 Calories 29 non-null float64
dtypes: float64(1), int64(3), object(1)
memory usage: 1.4+ KB
None

print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Duration 32 non-null int64
1 Date 31 non-null object
2 Pulse 32 non-null int64
3 Maxpulse 32 non-null int64
4 Calories 30 non-null float64
dtypes: float64(1), int64(3), object(1)
memory usage: 1.4+ KB
None

If you want to change the original DataFrame, use the inplace = True argument:

# Remove all rows with NULL values:


import pandas as pd

df = pd.read_csv('data1.csv')

df.dropna(inplace = True)

print(df.to_string())

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

print(df.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 29 entries, 0 to 31
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Duration 29 non-null int64
1 Date 29 non-null object
2 Pulse 29 non-null int64
3 Maxpulse 29 non-null int64
4 Calories 29 non-null float64
dtypes: float64(1), int64(3), object(1)
memory usage: 1.4+ KB
None

Note: Now, the dropna(inplace = True) will NOT return a new DataFrame, but it will remove all
rows containing NULL values from the original DataFrame.

Replace Empty Values

Another way of dealing with empty cells is to insert a new value instead.

This way you do not have to delete entire rows just because of some empty cells.

The fillna() method allows us to replace empty cells with a value:

# Replace NULL values with the number 130:


import pandas as pd

df = pd.read_csv('data1.csv')

df.fillna(130, inplace = True)

print(df.to_string())

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 130.0
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 130 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 130.0
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

Replace Only For Specified Columns

The example above replaces all empty cells in the whole Data Frame.

To only replace empty values for one column, specify the column name for the DataFrame:

import pandas as pd

df = pd.read_csv('data1.csv')

print(df.to_string())

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

# Replace NULL values in the "Calories" columns with the number 130:
df["Calories"].fillna(130, inplace = True)
print(df.to_string())

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 130.0
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 130.0
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

Replace Using Mean, Median, or Mode

A common way to replace empty cells, is to calculate the mean, median or mode value of the
column.

Pandas uses the mean(), median() and mode() methods to calculate the respective values for a
specified column:

# Calculate the MEAN, and replace any empty values with it:
import pandas as pd

df = pd.read_csv('data1.csv')

x = df["Calories"].mean()

df["Calories"].fillna(x, inplace = True)


print(x)

304.68

print(df.to_string())

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.10
1 60 '2020/12/02' 117 145 479.00
2 60 '2020/12/03' 103 135 340.00
3 45 '2020/12/04' 109 175 282.40
4 45 '2020/12/05' 117 148 406.00
5 60 '2020/12/06' 102 127 300.00
6 60 '2020/12/07' 110 136 374.00
7 450 '2020/12/08' 104 134 253.30
8 30 '2020/12/09' 109 133 195.10
9 60 '2020/12/10' 98 124 269.00
10 60 '2020/12/11' 103 147 329.30
11 60 '2020/12/12' 100 120 250.70
12 60 '2020/12/12' 100 120 250.70
13 60 '2020/12/13' 106 128 345.30
14 60 '2020/12/14' 104 132 379.30
15 60 '2020/12/15' 98 123 275.00
16 60 '2020/12/16' 98 120 215.20
17 60 '2020/12/17' 100 120 300.00
18 45 '2020/12/18' 90 112 304.68
19 60 '2020/12/19' 103 123 323.00
20 45 '2020/12/20' 97 125 243.00
21 60 '2020/12/21' 108 131 364.20
22 45 NaN 100 119 282.00
23 60 '2020/12/23' 130 101 300.00
24 45 '2020/12/24' 105 132 246.00
25 60 '2020/12/25' 102 126 334.50
26 60 26-12-2020 100 120 250.00
27 60 '2020/12/27' 92 118 241.00
28 60 '2020/12/28' 103 132 304.68
29 60 '2020/12/29' 100 132 280.00
30 60 '2020/12/30' 102 129 380.30
31 60 '2020/12/31' 92 115 243.00

Mean = the average value (the sum of all values divided by number of values).

# Calculate the MEDIAN, and replace any empty values with it:
import pandas as pd

df = pd.read_csv('data1.csv')

x = df["Calories"].median()

df["Calories"].fillna(x, inplace = True)


print(x)

291.2

print(df.to_string())

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 291.2
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 291.2
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

Median = the value in the middle, after you have sorted all values ascending.

# Calculate the MODE, and replace any empty values with it:
import pandas as pd

df = pd.read_csv('data1.csv')

x = df["Calories"].mode()[0]

df["Calories"].fillna(x, inplace = True)


print(x)

300.0

print(df.to_string())

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 300.0
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 300.0
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

Mode = the value that appears most frequently.

Pandas - Cleaning Data of Wrong Format

Cells with data of wrong format can make it difficult, or even impossible, to analyze data.

To fix it, you have two options: remove the rows, or convert all cells in the columns into the same
format.

print(df.to_string())

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 300.0
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 300.0
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

Convert Into a Correct Format

In our Data Frame, we have two cells with the wrong format. Check out row 22 and 26, the 'Date'
column should be a string that represents a date:

Let's convert all cells in the 'Date' column into dates.

Pandas has a to_datetime() method for this:

# Convert to date:
import pandas as pd

df = pd.read_csv('data1.csv')

df['Date'] = pd.to_datetime(df['Date'])

print(df.to_string())

Duration Date Pulse Maxpulse Calories


0 60 2020-12-01 110 130 409.1
1 60 2020-12-02 117 145 479.0
2 60 2020-12-03 103 135 340.0
3 45 2020-12-04 109 175 282.4
4 45 2020-12-05 117 148 406.0
5 60 2020-12-06 102 127 300.0
6 60 2020-12-07 110 136 374.0
7 450 2020-12-08 104 134 253.3
8 30 2020-12-09 109 133 195.1
9 60 2020-12-10 98 124 269.0
10 60 2020-12-11 103 147 329.3
11 60 2020-12-12 100 120 250.7
12 60 2020-12-12 100 120 250.7
13 60 2020-12-13 106 128 345.3
14 60 2020-12-14 104 132 379.3
15 60 2020-12-15 98 123 275.0
16 60 2020-12-16 98 120 215.2
17 60 2020-12-17 100 120 300.0
18 45 2020-12-18 90 112 NaN
19 60 2020-12-19 103 123 323.0
20 45 2020-12-20 97 125 243.0
21 60 2020-12-21 108 131 364.2
22 45 NaT 100 119 282.0
23 60 2020-12-23 130 101 300.0
24 45 2020-12-24 105 132 246.0
25 60 2020-12-25 102 126 334.5
26 60 2020-12-26 100 120 250.0
27 60 2020-12-27 92 118 241.0
28 60 2020-12-28 103 132 NaN
29 60 2020-12-29 100 132 280.0
30 60 2020-12-30 102 129 380.3
31 60 2020-12-31 92 115 243.0

C:\Users\gargs\AppData\Local\Temp\ipykernel_6776\951940559.py:6:
UserWarning: Parsing '26-12-2020' in DD/MM/YYYY format. Provide format
or specify infer_datetime_format=True for consistent parsing.
df['Date'] = pd.to_datetime(df['Date'])

As you can see from the result, the date in row 26 was fixed, but the empty date in row 22 got a
NaT (Not a Time) value, in other words an empty value. One way to deal with empty values is
simply removing the entire row.

Removing Rows

The result from the converting in the example above gave us a NaT value, which can be handled
as a NULL value, and we can remove the row by using the dropna() method.

# Remove rows with a NULL value in the "Date" column:


df.dropna(subset=['Date'], inplace = True)
df

Duration Date Pulse Maxpulse Calories


0 60 2020-12-01 110 130 409.1
1 60 2020-12-02 117 145 479.0
2 60 2020-12-03 103 135 340.0
3 45 2020-12-04 109 175 282.4
4 45 2020-12-05 117 148 406.0
5 60 2020-12-06 102 127 300.0
6 60 2020-12-07 110 136 374.0
7 450 2020-12-08 104 134 253.3
8 30 2020-12-09 109 133 195.1
9 60 2020-12-10 98 124 269.0
10 60 2020-12-11 103 147 329.3
11 60 2020-12-12 100 120 250.7
12 60 2020-12-12 100 120 250.7
13 60 2020-12-13 106 128 345.3
14 60 2020-12-14 104 132 379.3
15 60 2020-12-15 98 123 275.0
16 60 2020-12-16 98 120 215.2
17 60 2020-12-17 100 120 300.0
18 45 2020-12-18 90 112 NaN
19 60 2020-12-19 103 123 323.0
20 45 2020-12-20 97 125 243.0
21 60 2020-12-21 108 131 364.2
23 60 2020-12-23 130 101 300.0
24 45 2020-12-24 105 132 246.0
25 60 2020-12-25 102 126 334.5
26 60 2020-12-26 100 120 250.0
27 60 2020-12-27 92 118 241.0
28 60 2020-12-28 103 132 NaN
29 60 2020-12-29 100 132 280.0
30 60 2020-12-30 102 129 380.3
31 60 2020-12-31 92 115 243.0

Pandas - Fixing Wrong Data

"Wrong data" does not have to be "empty cells" or "wrong format", it can just be wrong, like if
someone registered "199" instead of "1.99".

Sometimes you can spot wrong data by looking at the data set, because you have an expectation
of what it should be.

If you take a look at our data set, you can see that in row 7, the duration is 450, but for all the
other rows the duration is between 30 and 60.

Replacing Values

One way to fix wrong values is to replace them with something else.

In our example, it is most likely a typo, and the value should be "45" instead of "450", and we
could just insert "45" in row 7:

# Set "Duration" = 45 in row 7:


df.loc[7, 'Duration'] = 45

For small data sets you might be able to replace the wrong data one by one, but not for big data
sets.

To replace wrong data for larger data sets you can create some rules, e.g. set some boundaries
for legal values, and replace any values that are outside of the boundaries.

# Loop through all values in the "Duration" column.

# If the value is higher than 120, set it to 120:

import pandas as pd

df = pd.read_csv('data1.csv')

for x in df.index:
if df.loc[x, "Duration"] > 120:
df.loc[x, "Duration"] = 120
df

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 120 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

Removing Rows

Another way of handling wrong data is to remove the rows that contains wrong data.

This way you do not have to find out what to replace them with, and there is a good chance you
do not need them to do your analyses.

# Delete rows where "Duration" is higher than 120:


import pandas as pd

df = pd.read_csv('data1.csv')

for x in df.index:
if df.loc[x, "Duration"] > 120:
df.drop(x, inplace = True)

df

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

Pandas - Removing Duplicates

import pandas as pd

df = pd.read_csv('data1.csv')
df

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

Duplicate rows are rows that have been registered more than one time.

By taking a look at our test data set, we can assume that row 11 and 12 are duplicates.

To discover duplicates, we can use the duplicated() method.

The duplicated() method returns a Boolean values for each row:

# Returns True for every row that is a duplicate, otherwise False:


print(df.duplicated())

0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 True
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
30 False
31 False
dtype: bool

Removing Duplicates

To remove duplicates, use the drop_duplicates() method.

# Remove all duplicates:


df.drop_duplicates(inplace = True)

df

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

Note: The (inplace = True) will make sure that the method does NOT return a new DataFrame,
but it will remove all duplicates from the original DataFrame.

Pandas - Data Correlations

The corr() method calculates the relationship between each column in your data set.

import pandas as pd

df = pd.read_csv('data1.csv')
df

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

df.corr()

Duration Pulse Maxpulse Calories


Duration 1.000000 0.004410 0.049959 -0.114169
Pulse 0.004410 1.000000 0.276583 0.513186
Maxpulse 0.049959 0.276583 1.000000 0.357460
Calories -0.114169 0.513186 0.357460 1.000000

Note: The corr() method ignores "not numeric" columns.

The number varies from -1 to 1.

1 means that there is a 1 to 1 relationship (a perfect correlation), and for this data set, each time a
value went up in the first column, the other one went up as well.

0.9 is also a good relationship, and if you increase one value, the other will probably increase as
well.

-0.9 would be just as good relationship as 0.9, but if you increase one value, the other will
probably go down.

0.2 means NOT a good relationship, meaning that if one value goes up does not mean that the
other will.

What is a good correlation? It depends on the use, but I think it is safe to say you have to have at
least 0.6 (or -0.6) to call it a good correlation.

Perfect Correlation:

We can see that "Duration" and "Duration" got the number 1.000000,
which makes sense, each column always has a perfect relationship with
itself.

Good Correlation:

"Duration" and "Calories" got a 0.922721 correlation, which is a very


good correlation, and we can predict that the longer you work out, the
more calories you burn, and the other way around: if you burned a lot
of calories, you probably had a long work out.
Bad Correlation:

"Duration" and "Maxpulse" got a 0.009403 correlation, which is a very


bad correlation, meaning that we can not predict the max pulse by just
looking at the duration of the work out, and vice versa.

import pandas as pd

df = pd.read_csv('data1.csv')
df

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

Obtain unique values in each column by using the unique function and count their occurrence in
a column using nunique function.

print(df['Duration'].unique())
print(df['Duration'].nunique())
[ 60 45 450 30]
4

print(df['Pulse'].unique())
print(df['Pulse'].nunique())

[110 117 103 109 102 104 98 100 106 90 97 108 130 105 92]
15

import pandas as pd

df = pd.read_csv('data1.csv')
df

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

Taking the subset of df


df1 = df[df['Duration'] == 60]
df1

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
19 60 '2020/12/19' 103 123 323.0
21 60 '2020/12/21' 108 131 364.2
23 60 '2020/12/23' 130 101 300.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

To find the value of Calories when Pulse is maximum

df1[ df1['Pulse'] == df1['Pulse'].max()]['Calories']

23 300.0
Name: Calories, dtype: float64

To find the value of Calories when Pulse is minimum

df1[ df1['Pulse'] == df1['Pulse'].min()]['Calories']

27 241.0
31 243.0
Name: Calories, dtype: float64

To find the mean value of Pulse when Duration is 60

df1[df1['Duration'] == 60]["Pulse"].mean()

103.375
Groupby

Pandas dataframe.groupby() function is used to split the data into groups based on some
criteria.

df

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

df.groupby('Pulse').describe()

Duration
\
count mean std min 25% 50% 75% max

Pulse

90 1.0 45.0 NaN 45.0 45.00 45.0 45.00 45.0


92 2.0 60.0 0.000000 60.0 60.00 60.0 60.00 60.0

97 1.0 45.0 NaN 45.0 45.00 45.0 45.00 45.0

98 3.0 60.0 0.000000 60.0 60.00 60.0 60.00 60.0

100 6.0 57.5 6.123724 45.0 60.00 60.0 60.00 60.0

102 3.0 60.0 0.000000 60.0 60.00 60.0 60.00 60.0

103 4.0 60.0 0.000000 60.0 60.00 60.0 60.00 60.0

104 2.0 255.0 275.771645 60.0 157.50 255.0 352.50 450.0

105 1.0 45.0 NaN 45.0 45.00 45.0 45.00 45.0

106 1.0 60.0 NaN 60.0 60.00 60.0 60.00 60.0

108 1.0 60.0 NaN 60.0 60.00 60.0 60.00 60.0

109 2.0 37.5 10.606602 30.0 33.75 37.5 41.25 45.0

110 2.0 60.0 0.000000 60.0 60.00 60.0 60.00 60.0

117 2.0 52.5 10.606602 45.0 48.75 52.5 56.25 60.0

130 1.0 60.0 NaN 60.0 60.00 60.0 60.00 60.0

Maxpulse ... Calories \


count mean ... 75% max count mean
Pulse ...
90 1.0 112.000000 ... 112.00 112.0 0.0 NaN
92 2.0 116.500000 ... 117.25 118.0 2.0 242.000000
97 1.0 125.000000 ... 125.00 125.0 1.0 243.000000
98 3.0 122.333333 ... 123.50 124.0 3.0 253.066667
100 6.0 121.833333 ... 120.00 132.0 6.0 268.900000
102 3.0 127.333333 ... 128.00 129.0 3.0 338.266667
103 4.0 134.250000 ... 138.00 147.0 3.0 330.766667
104 2.0 133.000000 ... 133.50 134.0 2.0 316.300000
105 1.0 132.000000 ... 132.00 132.0 1.0 246.000000
106 1.0 128.000000 ... 128.00 128.0 1.0 345.300000
108 1.0 131.000000 ... 131.00 131.0 1.0 364.200000
109 2.0 154.000000 ... 164.50 175.0 2.0 238.750000
110 2.0 133.000000 ... 134.50 136.0 2.0 391.550000
117 2.0 146.500000 ... 147.25 148.0 2.0 442.500000
130 1.0 101.000000 ... 101.00 101.0 1.0 300.000000

std min 25% 50% 75% max


Pulse
90 NaN NaN NaN NaN NaN NaN
92 1.414214 241.0 241.500 242.00 242.500 243.0
97 NaN 243.0 243.000 243.00 243.000 243.0
98 32.930432 215.2 242.100 269.00 272.000 275.0
100 21.362210 250.0 250.700 265.35 281.500 300.0
102 40.282296 300.0 317.250 334.50 357.400 380.3
103 8.594378 323.0 326.150 329.30 334.650 340.0
104 89.095454 253.3 284.800 316.30 347.800 379.3
105 NaN 246.0 246.000 246.00 246.000 246.0
106 NaN 345.3 345.300 345.30 345.300 345.3
108 NaN 364.2 364.200 364.20 364.200 364.2
109 61.730422 195.1 216.925 238.75 260.575 282.4
110 24.819448 374.0 382.775 391.55 400.325 409.1
117 51.618795 406.0 424.250 442.50 460.750 479.0
130 NaN 300.0 300.000 300.00 300.000 300.0

[15 rows x 24 columns]

df.groupby('Pulse').describe().reset_index()

Pulse Duration
\
count mean std min 25% 50% 75%
max
0 90 1.0 45.0 NaN 45.0 45.00 45.0 45.00
45.0
1 92 2.0 60.0 0.000000 60.0 60.00 60.0 60.00
60.0
2 97 1.0 45.0 NaN 45.0 45.00 45.0 45.00
45.0
3 98 3.0 60.0 0.000000 60.0 60.00 60.0 60.00
60.0
4 100 6.0 57.5 6.123724 45.0 60.00 60.0 60.00
60.0
5 102 3.0 60.0 0.000000 60.0 60.00 60.0 60.00
60.0
6 103 4.0 60.0 0.000000 60.0 60.00 60.0 60.00
60.0
7 104 2.0 255.0 275.771645 60.0 157.50 255.0 352.50
450.0
8 105 1.0 45.0 NaN 45.0 45.00 45.0 45.00
45.0
9 106 1.0 60.0 NaN 60.0 60.00 60.0 60.00
60.0
10 108 1.0 60.0 NaN 60.0 60.00 60.0 60.00
60.0
11 109 2.0 37.5 10.606602 30.0 33.75 37.5 41.25
45.0
12 110 2.0 60.0 0.000000 60.0 60.00 60.0 60.00
60.0
13 117 2.0 52.5 10.606602 45.0 48.75 52.5 56.25
60.0
14 130 1.0 60.0 NaN 60.0 60.00 60.0 60.00
60.0

Maxpulse ... Calories


\
count ... 75% max count mean std min

0 1.0 ... 112.00 112.0 0.0 NaN NaN NaN

1 2.0 ... 117.25 118.0 2.0 242.000000 1.414214 241.0

2 1.0 ... 125.00 125.0 1.0 243.000000 NaN 243.0

3 3.0 ... 123.50 124.0 3.0 253.066667 32.930432 215.2

4 6.0 ... 120.00 132.0 6.0 268.900000 21.362210 250.0

5 3.0 ... 128.00 129.0 3.0 338.266667 40.282296 300.0

6 4.0 ... 138.00 147.0 3.0 330.766667 8.594378 323.0

7 2.0 ... 133.50 134.0 2.0 316.300000 89.095454 253.3

8 1.0 ... 132.00 132.0 1.0 246.000000 NaN 246.0

9 1.0 ... 128.00 128.0 1.0 345.300000 NaN 345.3

10 1.0 ... 131.00 131.0 1.0 364.200000 NaN 364.2

11 2.0 ... 164.50 175.0 2.0 238.750000 61.730422 195.1

12 2.0 ... 134.50 136.0 2.0 391.550000 24.819448 374.0

13 2.0 ... 147.25 148.0 2.0 442.500000 51.618795 406.0

14 1.0 ... 101.00 101.0 1.0 300.000000 NaN 300.0

25% 50% 75% max


0 NaN NaN NaN NaN
1 241.500 242.00 242.500 243.0
2 243.000 243.00 243.000 243.0
3 242.100 269.00 272.000 275.0
4 250.700 265.35 281.500 300.0
5 317.250 334.50 357.400 380.3
6 326.150 329.30 334.650 340.0
7 284.800 316.30 347.800 379.3
8 246.000 246.00 246.000 246.0
9 345.300 345.30 345.300 345.3
10 364.200 364.20 364.200 364.2
11 216.925 238.75 260.575 282.4
12 382.775 391.55 400.325 409.1
13 424.250 442.50 460.750 479.0
14 300.000 300.00 300.000 300.0

[15 rows x 25 columns]

df.describe()

Duration Pulse Maxpulse Calories


count 32.000000 32.000000 32.000000 30.000000
mean 68.437500 103.500000 128.500000 304.680000
std 70.039591 7.832933 12.998759 66.003779
min 30.000000 90.000000 101.000000 195.100000
25% 60.000000 100.000000 120.000000 250.700000
50% 60.000000 102.500000 127.500000 291.200000
75% 60.000000 106.500000 132.250000 343.975000
max 450.000000 130.000000 175.000000 479.000000

df.groupby('Pulse')

<pandas.core.groupby.generic.DataFrameGroupBy object at
0x00000220A4F94BB0>

df = pd.read_csv('data1.csv')
df.groupby('Pulse').mean()

Duration Maxpulse Calories


Pulse
90 45.0 112.000000 NaN
92 60.0 116.500000 242.000000
97 45.0 125.000000 243.000000
98 60.0 122.333333 253.066667
100 57.5 121.833333 268.900000
102 60.0 127.333333 338.266667
103 60.0 134.250000 330.766667
104 255.0 133.000000 316.300000
105 45.0 132.000000 246.000000
106 60.0 128.000000 345.300000
108 60.0 131.000000 364.200000
109 37.5 154.000000 238.750000
110 60.0 133.000000 391.550000
117 52.5 146.500000 442.500000
130 60.0 101.000000 300.000000

df.groupby(['Pulse','Duration']).mean().reset_index()

Pulse Duration Maxpulse Calories


0 90 45 112.000000 NaN
1 92 60 116.500000 242.000000
2 97 45 125.000000 243.000000
3 98 60 122.333333 253.066667
4 100 45 119.000000 282.000000
5 100 60 122.400000 266.280000
6 102 60 127.333333 338.266667
7 103 60 134.250000 330.766667
8 104 60 132.000000 379.300000
9 104 450 134.000000 253.300000
10 105 45 132.000000 246.000000
11 106 60 128.000000 345.300000
12 108 60 131.000000 364.200000
13 109 30 133.000000 195.100000
14 109 45 175.000000 282.400000
15 110 60 133.000000 391.550000
16 117 45 148.000000 406.000000
17 117 60 145.000000 479.000000
18 130 60 101.000000 300.000000

data = {'Category': ['A', 'B', 'A', 'B', 'A'],


'Value': [10, 20, 15, 25, 30],
'Price': [100, 200, 150, 250, 300]
}
df = pd.DataFrame(data)
# Aggregating data using groupby
groupby_category = df.groupby('Category').sum()
groupby_category1 = df.groupby('Category').sum().reset_index()
print(groupby_category)
print(groupby_category1)

Value Price
Category
A 55 550
B 45 450
Category Value Price
0 A 55 550
1 B 45 450

reset_index

Pandas reset_index() is a method to reset index of a Data Frame. reset_index() method sets a list
of integer ranging from 0 to length of data as index.

df

Category Value Price


0 A 10 100
1 B 20 200
2 A 15 150
3 B 25 250
4 A 30 300

df.reset_index()

index Category Value Price


0 0 A 10 100
1 1 B 20 200
2 2 A 15 150
3 3 B 25 250
4 4 A 30 300

Value Counts

value_counts() provides the occurrence of each unique value in a column.

df

Category Value Price


0 A 10 100
1 B 20 200
2 A 15 150
3 B 25 250
4 A 30 300

df = pd.read_csv('data1.csv')
df['Duration'].value_counts()

60 24
45 6
450 1
30 1
Name: Duration, dtype: int64

df.Duration.value_counts()

60 24
45 6
450 1
30 1
Name: Duration, dtype: int64

df.Calories.value_counts()

300.0 3
243.0 2
250.7 2
409.1 1
275.0 1
280.0 1
241.0 1
250.0 1
334.5 1
246.0 1
282.0 1
364.2 1
323.0 1
215.2 1
379.3 1
479.0 1
345.3 1
329.3 1
269.0 1
195.1 1
253.3 1
374.0 1
406.0 1
282.4 1
340.0 1
380.3 1
Name: Calories, dtype: int64

Cross Tabulations

Cross-tabulation features will help find occurrences for the combination of values for two
columns.

df

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

pd.crosstab(df['Duration'],df['Pulse'])

Pulse 90 92 97 98 100 102 103 104 105 106 108 109


110 \
Duration

30 0 0 0 0 0 0 0 0 0 0 0 1
0
45 1 0 1 0 1 0 0 0 1 0 0 1
0
60 0 2 0 3 5 3 4 1 0 1 1 0
2
450 0 0 0 0 0 0 0 1 0 0 0 0
0

Pulse 117 130


Duration
30 0 0
45 1 0
60 1 1
450 0 0

Sorting Dataframe

df

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 26-12-2020 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

df[['Duration','Pulse']]

Duration Pulse
0 60 110
1 60 117
2 60 103
3 45 109
4 45 117
5 60 102
6 60 110
7 450 104
8 30 109
9 60 98
10 60 103
11 60 100
12 60 100
13 60 106
14 60 104
15 60 98
16 60 98
17 60 100
18 45 90
19 60 103
20 45 97
21 60 108
22 45 100
23 60 130
24 45 105
25 60 102
26 60 100
27 60 92
28 60 103
29 60 100
30 60 102
31 60 92

# Sort the values of Pulse when duration is given


df[['Duration','Pulse']].sort_values('Pulse')[0:10]

Duration Pulse
18 45 90
31 60 92
27 60 92
20 45 97
16 60 98
9 60 98
15 60 98
29 60 100
11 60 100
12 60 100

# Calculate the total quantity of each product.


import pandas as pd
data = {'OrderID': [1, 2, 3, 4, 5],
'Product': ['Widget', 'Gadget', 'Widget', 'Gadget', 'Widget'],
'Quantity': [3, 2, 5, 4, 6]}
orders = pd.DataFrame(data)
product_quantity = orders.groupby('Product')
['Quantity'].sum().reset_index()
print(orders)
print(product_quantity)

OrderID Product Quantity


0 1 Widget 3
1 2 Gadget 2
2 3 Widget 5
3 4 Gadget 4
4 5 Widget 6
Product Quantity
0 Gadget 6
1 Widget 14

Matplotlib
Matplotlib is a low level graph plotting library in python that serves as a visualization utility.
Matplotlib was created by John D. Hunter.

Matplotlib is open source and we can use it freely.

Matplotlib is mostly written in python, a few segments are written in C, Objective-C and
Javascript for Platform compatibility.

Installation of Matplotlib

pip install matplotlib

Requirement already satisfied: matplotlib in c:\users\gargs\anaconda3\


lib\site-packages (3.5.1)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\gargs\
anaconda3\lib\site-packages (from matplotlib) (1.3.2)
Requirement already satisfied: pyparsing>=2.2.1 in c:\users\gargs\
anaconda3\lib\site-packages (from matplotlib) (3.0.4)
Requirement already satisfied: pillow>=6.2.0 in c:\users\gargs\
anaconda3\lib\site-packages (from matplotlib) (9.0.1)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\gargs\
anaconda3\lib\site-packages (from matplotlib) (4.25.0)
Requirement already satisfied: cycler>=0.10 in c:\users\gargs\
anaconda3\lib\site-packages (from matplotlib) (0.11.0)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\gargs\
anaconda3\lib\site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: packaging>=20.0 in c:\users\gargs\
anaconda3\lib\site-packages (from matplotlib) (21.3)
Requirement already satisfied: numpy>=1.17 in c:\users\gargs\
anaconda3\lib\site-packages (from matplotlib) (1.22.4)
Requirement already satisfied: six>=1.5 in c:\users\gargs\anaconda3\
lib\site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
Note: you may need to restart the kernel to use updated packages.

Import Matplotlib

import matplotlib

Checking Matplotlib Version

print(matplotlib.__version__)

3.5.1

import matplotlib.pyplot as plt

Plotting x and y points

The plot() function is used to draw points (markers) in a diagram.

By default, the plot() function draws a line from point to point.


The function takes parameters for specifying points in the diagram.

Parameter 1 is an array containing the points on the x-axis.

Parameter 2 is an array containing the points on the y-axis.

If we need to plot a line from (1, 3) to (8, 10), we have to pass two arrays [1, 8] and [3, 10] to the
plot function.

# Draw a line in a diagram from position (1,8) to position (3,10):


import matplotlib.pyplot as plt
import numpy as np

xpoints = np.array([1, 8])


ypoints = np.array([3, 10])

plt.plot(xpoints, ypoints)
plt.show()

Plotting Without Line

To plot only the markers, you can use shortcut string notation parameter 'o', which means
'rings'.

# Draw two points in the diagram, one at position (1, 3) and one in
position (8, 10):
import matplotlib.pyplot as plt
import numpy as np

xpoints = np.array([1, 8])


ypoints = np.array([3, 10])

plt.plot(xpoints, ypoints, 'o')


plt.show()

Multiple Points

You can plot as many points as you like, just make sure you have the same number of points in
both axis.

Draw a line in a diagram from position (1, 3) to (2, 8) then to (6, 1) and finally to position (8, 10):

import matplotlib.pyplot as plt


import numpy as np

xpoints = np.array([1, 2, 6, 8])


ypoints = np.array([3, 8, 1, 10])

plt.plot(xpoints, ypoints)
plt.show()
Default X-Points

If we do not specify the points on the x-axis, they will get the default values 0, 1, 2, 3 etc.,
depending on the length of the y-points.

So, if we take the same example as above, and leave out the x-points, the diagram will look like
this:

# Plotting without x-points:


import matplotlib.pyplot as plt
import numpy as np

ypoints = np.array([3, 8, 1, 10, 5, 7])

plt.plot(ypoints)
plt.show()
Markers

You can use the keyword argument marker to emphasize each point with a specified marker:

# Draw two points in the diagram, one at position (1, 3) and one in
position (8, 10):
import matplotlib.pyplot as plt
import numpy as np

xpoints = np.array([1, 8])


ypoints = np.array([3, 10])

plt.plot(xpoints, ypoints, marker='o')


plt.show()
import matplotlib.pyplot as plt
import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, marker = 'o')


plt.show()
import matplotlib.pyplot as plt
import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, marker = '*') # +, *


plt.show()

Marker Description

'o' Circle

'*' Star

'.' Point

',' Pixel

'x' X

'X' X (filled)

'+' Plus

'P' Plus (filled)

's' Square

'D' Diamond
'd' Diamond (thin)

'p' Pentagon

'H' Hexagon

'h' Hexagon

'v' Triangle Down

'^' Triangle Up

'<' Triangle Left

'>' Triangle Right

'1' Tri Down

'2' Tri Up

'3' Tri Left

'4' Tri Right

'|' Vline

'_' Hline

Format Strings fmt

You can also use the shortcut string notation parameter to specify the marker.

This parameter is also called fmt, and is written with this syntax:

marker|line|color

import matplotlib.pyplot as plt


import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, 'o--r')
plt.show()
Line Reference

Line Syntax Description

'-' Solid line

':' Dotted line

'--' Dashed line

'-.' Dashed/dotted line

Color Reference

Color Syntax Description

'r' Red
'g' Green
'b' Blue
'c' Cyan
'm' Magenta
'y' Yellow
'k' Black
'w' White

Marker Size

You can use the keyword argument markersize or the shorter version, ms to set the size of the
markers:
# Set the size of the markers to 20:
import matplotlib.pyplot as plt
import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, marker = 'o', ms = 20)


plt.show()

Marker Color

You can use the keyword argument markeredgecolor or the shorter mec to set the color of the
edge of the markers:

# Set the EDGE color to red:


import matplotlib.pyplot as plt
import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, marker = 'o', ms = 20, mec = 'r')


plt.show()
You can use the keyword argument markerfacecolor or the shorter mfc to set the color inside
the edge of the markers:

# Set the FACE color to red:


import matplotlib.pyplot as plt
import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, marker = 'o', ms = 20, mfc = 'r')


plt.show()
Use both the mec and mfc arguments to color the entire marker:

# Set the color of both the edge and the face to red:
import matplotlib.pyplot as plt
import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, marker = 'o', ms = 20, mec = 'r', mfc = 'r')


plt.show()
You can also use Hexadecimal color values:

https://www.w3schools.com/colors/colors_hexadecimal.asp

Or any of the 140 supported color names.

https://www.w3schools.com/colors/colors_names.asp

Linestyle

You can use the keyword argument linestyle, or shorter ls, to change the style of the plotted line:

import matplotlib.pyplot as plt


import numpy as np

ypoints = np.array([3, 8, 1, 10])

# plt.plot(ypoints, linestyle = 'dotted')


plt.plot(ypoints, ls = ':')
plt.show()

Line Styles

You can choose any of these styles:

Style Or
'solid' (default) '-'
'dotted' ':'
'dashed' '--'
'dashdot' '-.'
'None' '' or ' '

Line Color

You can use the keyword argument color or the shorter c to set the color of the line:

# Set the line color to red:

import matplotlib.pyplot as plt


import numpy as np

ypoints = np.array([3, 8, 1, 10])

# plt.plot(ypoints, ls = '--',color = 'r')


plt.plot(ypoints, c = 'r')
plt.show()

Line Width

You can use the keyword argument linewidth or the shorter lw to change the width of the line.

The value is a floating number, in points:

# Plot with a 20.5pt wide line:


import matplotlib.pyplot as plt
import numpy as np
ypoints = np.array([3, 8, 1, 10])

# plt.plot(ypoints, linewidth = '20.5')


plt.plot(ypoints, lw = '20.5')
plt.show()

Multiple Lines

You can plot as many lines as you like by simply adding more plt.plot() functions:

# Draw two lines by specifying a plt.plot() function for each line:


import matplotlib.pyplot as plt
import numpy as np

y1 = np.array([3, 8, 1, 10])
y2 = np.array([6, 2, 7, 11])

plt.plot(y1)
plt.plot(y2)

plt.show()
You can also plot many lines by adding the points for the x- and y-axis for each line in the same
plt.plot() function.

(In the examples above we only specified the points on the y-axis, meaning that the points on
the x-axis got the the default values (0, 1, 2, 3).)

The x- and y- values come in pairs:

# Draw two lines by specifiyng the x- and y-point values for both
lines:
import matplotlib.pyplot as plt
import numpy as np

x1 = np.array([0, 1, 2, 3])
y1 = np.array([3, 8, 1, 10])
x2 = np.array([0, 1, 2, 3])
y2 = np.array([6, 2, 7, 11])

# plt.plot(x1, y1, y2)


plt.plot(x1, y1, x2, y2)
plt.show()
Create Labels for a Plot

With Pyplot, you can use the xlabel() and ylabel() functions to set a label for the x- and y-axis.

import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

plt.plot(x, y)

plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")

plt.show()
Create a Title for a Plot

With Pyplot, you can use the title() function to set a title for the plot.

import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

plt.plot(x, y)

plt.title("Sports Watch Data")


plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")

plt.show()
Set Font Properties for Title and Labels

You can use the fontdict parameter in xlabel(), ylabel(), and title() to set font properties for the
title and labels.

import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

font1 = {'family':'serif','color':'blue','size':20}
font2 = {'family':'serif','color':'darkred','size':15}

plt.title("Sports Watch Data", fontdict = font1)


plt.xlabel("Average Pulse", fontdict = font2)
plt.ylabel("Calorie Burnage", fontdict = font2)

plt.plot(x, y)
plt.show()
Position the Title

You can use the loc parameter in title() to position the title.

Legal values are: 'left', 'right', and 'center'. Default value is 'center'.

import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

plt.title("Sports Watch Data", loc = 'left')


plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")

plt.plot(x, y)
plt.show()
Add Grid Lines to a Plot

With Pyplot, you can use the grid() function to add grid lines to the plot.

import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

plt.title("Sports Watch Data")


plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")

plt.plot(x, y)

plt.grid()

plt.show()
Specify Which Grid Lines to Display

You can use the axis parameter in the grid() function to specify which grid lines to display.

Legal values are: 'x', 'y', and 'both'. Default value is 'both'.

import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

plt.title("Sports Watch Data")


plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")

plt.plot(x, y)

plt.grid(axis = 'x')

plt.show()
import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

plt.title("Sports Watch Data")


plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")

plt.plot(x, y)

plt.grid(axis = 'y')

plt.show()
Set Line Properties for the Grid

You can also set the line properties of the grid, like this: grid(color = 'color', linestyle = 'linestyle',
linewidth = number).

import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

plt.title("Sports Watch Data")


plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")

plt.plot(x, y)

plt.grid(color = 'green', linestyle = '--', linewidth = 0.5)

plt.show()
Display Multiple Plots

With the subplot() function you can draw multiple plots in one figure:

# Draw 2 plots:

import matplotlib.pyplot as plt


import numpy as np

#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(1, 2, 1)
plt.plot(x,y)

#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(1, 2, 2)
plt.plot(x,y)

plt.show()
# Draw 2 plots:

import matplotlib.pyplot as plt


import numpy as np

#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(2, 1, 1)
plt.plot(x,y)

#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(2, 1, 2)
plt.plot(x,y)

plt.show()
# Draw 6 plots:

import matplotlib.pyplot as plt


import numpy as np

x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(2, 3, 1)
plt.plot(x,y)

x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(2, 3, 2)
plt.plot(x,y)

x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(2, 3, 3)
plt.plot(x,y)

x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(2, 3, 4)
plt.plot(x,y)

x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(2, 3, 5)
plt.plot(x,y)

x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(2, 3, 6)
plt.plot(x,y)

plt.show()

Title

You can add a title to each plot with the title() function:

# 2 plots, with titles:

import matplotlib.pyplot as plt


import numpy as np

#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(1, 2, 1)
plt.plot(x,y)
plt.title("SALES")
plt.xlabel('x1')
plt.ylabel('y')

#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.title("INCOME")
plt.xlabel('x2')

plt.show()

Super Title

You can add a title to the entire figure with the suptitle() function:

# Add a title for the entire figure:

import matplotlib.pyplot as plt


import numpy as np

#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(1, 2, 1)
plt.plot(x,y)
plt.title("SALES")

#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.title("INCOME")

plt.suptitle("MY SHOP")
plt.show()

Creating Scatter Plots

With Pyplot, you can use the scatter() function to draw a scatter plot.

The scatter() function plots one dot for each observation. It needs two arrays of the same length,
one for the values of the x-axis, and one for values on the y-axis:

# A simple scatter plot:

import matplotlib.pyplot as plt


import numpy as np

x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)
plt.xlabel("Car Age")
plt.ylabel("Speed of Car")
plt.show()

The observation in the example above is the result of 13 cars passing by.

The X-axis shows how old the car is.

The Y-axis shows the speed of the car when it passes.

Are there any relationships between the observations?

It seems that the newer the car, the faster it drives, but that could be a coincidence, after all we
only registered 13 cars.

Compare Plots

In the example above, there seems to be a relationship between speed and age, but what if we
plot the observations from another day as well? Will the scatter plot tell us something else?

# Draw two plots on the same figure:

import matplotlib.pyplot as plt


import numpy as np

#day one, the age and speed of 13 cars:


x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)

#day two, the age and speed of 15 cars:


x = np.array([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12])
y = np.array([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85])
plt.scatter(x, y)
plt.xlabel("Car Age")
plt.ylabel("Speed of Car")
plt.show()

Color Each Dot

You can even set a specific color for each dot by using an array of colors as value for the c
argument:

Note: You cannot use the color argument for this, only the c argument.

# Set your own color of the markers:

import matplotlib.pyplot as plt


import numpy as np

x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors =
np.array(["red","green","blue","yellow","pink","black","orange","purpl
e","beige","brown","gray","cyan","magenta"])

plt.scatter(x, y, c=colors)
plt.xlabel("Car Age")
plt.ylabel("Speed of Car")
plt.show()

ColorMap

The Matplotlib module has a number of available colormaps.

A colormap is like a list of colors, where each color has a value that ranges from 0 to 100.

Here is an example of a colormap:

https://www.w3schools.com/python/matplotlib_scatter.asp

This colormap is called 'viridis' and as you can see it ranges from 0, which is a purple color, up to
100, which is a yellow color.

How to Use the ColorMap

You can specify the colormap with the keyword argument cmap with the value of the colormap,
in this case 'viridis' which is one of the built-in colormaps available in Matplotlib.

In addition you have to create an array with values (from 0 to 100), one value for each point in
the scatter plot:

# Create a color array, and specify a colormap in the scatter plot:

import matplotlib.pyplot as plt


import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors = np.array([0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90,
100])

plt.scatter(x, y, c=colors, cmap='viridis')


plt.xlabel("Car Age")
plt.ylabel("Speed of Car")
plt.show()

You can include the colormap in the drawing by including the plt.colorbar() statement:

# Include the actual colormap:

import matplotlib.pyplot as plt


import numpy as np

x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors = np.array([0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90,
100])

# plt.scatter(x, y, c=colors, cmap='Purples_r')


plt.scatter(x, y, c=colors, cmap='viridis')

plt.colorbar()

plt.show()
Available ColorMaps

You can choose any of the built-in colormaps:

Name Reverse
Accent Accent_r
Blues Blues_r
BrBG BrBG_r
BuGn BuGn_r
BuPu BuPu_r
CMRmap CMRmap_r
Dark2 Dark2_r
GnBu GnBu_r
Greens Greens_r
Greys Greys_r
OrRd OrRd_r
Oranges Oranges_r
PRGn PRGn_r
Paired Paired_r
Pastel1 Pastel1_r
Pastel2 Pastel2_r
PiYG PiYG_r
PuBu PuBu_r
PuBuGn PuBuGn_r
PuOr PuOr_r
PuRd PuRd_r
Purples Purples_r
RdBu RdBu_r
RdGy RdGy_r
RdPu RdPu_r
RdYlBu RdYlBu_r
RdYlGn RdYlGn_r
Reds Reds_r
Set1 Set1_r
Set2 Set2_r
Set3 Set3_r
Spectral Spectral_r
Wistia Wistia_r
YlGn YlGn_r
YlGnBu YlGnBu_r
YlOrBr YlOrBr_r
YlOrRd YlOrRd_r
afmhot afmhot_r
autumn autumn_r
binary binary_r
bone bone_r
brg brg_r
bwr bwr_r
cividis cividis_r
cool cool_r
coolwarm coolwarm_r
copper copper_r
cubehelix cubehelix_r
flag flag_r
gist_earth gist_earth_r
gist_gray gist_gray_r
gist_heat gist_heat_r
gist_ncar gist_ncar_r
gist_rainbow gist_rainbow_r
gist_stern gist_stern_r
gist_yarg gist_yarg_r
gnuplot gnuplot_r
gnuplot2 gnuplot2_r
gray gray_r
hot hot_r
hsv hsv_r
inferno inferno_r
jet jet_r
magma magma_r
nipy_spectral nipy_spectral_r
ocean ocean_r
pink pink_r
plasma plasma_r
prism prism_r
rainbow rainbow_r
seismic seismic_r
spring spring_r
summer summer_r
tab10 tab10_r
tab20 tab20_r
tab20b tab20b_r
tab20c tab20c_r
terrain terrain_r
twilight twilight_r
twilight_shifted twilight_shifted_r
viridis viridis_r
winter winter_r

Size

You can change the size of the dots with the s argument.

Just like colors, make sure the array for sizes has the same length as the arrays for the x- and y-
axis:

# Set your own size for the markers:

import matplotlib.pyplot as plt


import numpy as np

x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
sizes = np.array([20,50,100,200,500,1000,60,90,10,300,600,800,75])

plt.scatter(x, y, s=sizes)

plt.show()
Alpha

You can adjust the transparency of the dots with the alpha argument.

Just like colors, make sure the array for sizes has the same length as the arrays for the x- and y-
axis:

# Set your own size for the markers:

import matplotlib.pyplot as plt


import numpy as np

x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
sizes = np.array([20,50,100,200,500,1000,60,90,10,300,600,800,75])

plt.scatter(x, y, s=sizes, alpha=0.5)

plt.show()

Combine Color Size and Alpha

You can combine a colormap with different sizes of the dots. This is best visualized if the dots
are transparent:

# Create random arrays with 100 values for x-points, y-points, colors
and sizes:

import matplotlib.pyplot as plt


import numpy as np
x = np.random.randint(100, size=(100))
y = np.random.randint(100, size=(100))
colors = np.random.randint(100, size=(100))
sizes = 10 * np.random.randint(100, size=(100))

plt.scatter(x, y, c=colors, s=sizes, alpha=0.5, cmap='nipy_spectral')

plt.colorbar()

plt.show()

Creating Bars

With Pyplot, you can use the bar() function to draw bar graphs:

# Draw 4 bars:

import matplotlib.pyplot as plt


import numpy as np

x = np.array(["A", "B", "C", "D"])


y = np.array([3, 8, 1, 10])

plt.bar(x,y)
plt.show()
The bar() function takes arguments that describes the layout of the bars.

The categories and their values represented by the first and second argument as arrays.

x = ["APPLES", "BANANAS"]
y = [400, 350]
plt.bar(x, y)

<BarContainer object of 2 artists>


Horizontal Bars

If you want the bars to be displayed horizontally instead of vertically, use the barh() function:

# Draw 4 horizontal bars:

import matplotlib.pyplot as plt


import numpy as np

x = np.array(["A", "B", "C", "D"])


y = np.array([3, 8, 1, 10])

plt.barh(x, y)
plt.show()

Bar Color

The bar() and barh() take the keyword argument color to set the color of the bars:

# Draw 4 red bars:

import matplotlib.pyplot as plt


import numpy as np

x = np.array(["A", "B", "C", "D"])


y = np.array([3, 8, 1, 10])

plt.bar(x, y, color = "red")


plt.show()
Bar Width

The bar() takes the keyword argument width to set the width of the bars:

# Draw 4 very thin bars:

import matplotlib.pyplot as plt


import numpy as np

x = np.array(["A", "B", "C", "D"])


y = np.array([3, 8, 1, 10])

plt.bar(x, y, width = 0.1)


plt.show()
The default width value is 0.8

Note: For horizontal bars, use height instead of width.

Bar Height

The barh() takes the keyword argument height to set the height of the bars:

# Draw 4 very thin bars:

import matplotlib.pyplot as plt


import numpy as np

x = np.array(["A", "B", "C", "D"])


y = np.array([3, 8, 1, 10])

plt.barh(x, y, height = 0.1)


plt.show()
The default height value is 0.8

Histogram

A histogram is a graph showing frequency distributions.

It is a graph showing the number of observations within each given interval.

In Matplotlib, we use the hist() function to create histograms.

The hist() function will use an array of numbers to create a histogram, the array is sent into the
function as an argument.

For simplicity we use NumPy to randomly generate an array with 250 values, where the values
will concentrate around 170, and the standard deviation is 10.

We specify that the mean value is 170, and the standard deviation is 10.

Meaning that the values should be concentrated around 170, and rarely further away than 10
from the mean.

And as you can see from the histogram, most values are between 160 and 180, with a top at
approximately 170.

# A Normal Data Distribution by NumPy:

import numpy as np

x = np.random.normal(170, 10, 250) # (Mean, SD, Number of values)

print(x)
[162.82473535 161.24148305 152.81908739 156.42140434 177.66922688
169.9879616 160.756545 177.85540601 159.98379758 177.43989869
185.99099115 170.98105068 185.32823796 166.58397616 163.03517336
180.30492701 162.04745089 177.93312155 157.30676816 171.73554506
172.670417 154.58684949 174.39945968 163.78579293 160.67909659
159.70805791 166.31807211 181.44829153 176.55106405 162.08056534
167.60361635 180.95715229 163.27958009 152.4366006 160.0574906
144.70618854 176.45877577 166.05807122 176.75142851 162.0708222
161.49867634 163.10645812 181.54541804 168.01228348 158.68856741
148.24083068 164.00330401 160.91314714 164.78247856 182.3743663
169.23842511 165.99017778 157.31897213 169.69702019 163.36983982
165.24484739 162.13890473 171.08192126 157.3047641 184.74539136
168.5122959 169.63305775 175.31982763 159.426164 174.6899734
167.60808368 160.30384157 179.67467148 175.60982726 178.19325503
187.84335758 173.87842969 167.87475811 171.00975942 189.69271088
167.6849284 166.5792361 166.95197375 172.14047764 166.31610278
174.56871043 180.25427937 143.08645722 174.23721057 179.62120818
165.24435097 194.18063389 179.34468962 176.42744483 157.94729365
168.85587512 166.75532194 170.49840416 187.0723715 163.5673682
179.08397599 167.77843564 170.5401053 173.63905859 157.96012641
183.98360431 158.77678822 159.31159852 159.49375082 160.28992186
186.0497138 178.99975871 158.87180661 162.85046005 175.98346169
165.8030326 158.30532436 166.60143589 154.41193594 177.26717414
162.13727597 171.5995603 167.33684079 174.9568394 158.47527501
167.00271717 176.08264305 184.76047612 156.57142819 182.29236386
170.82633354 163.28594098 154.96856184 178.2858139 157.80304212
172.0308982 163.51841416 158.06308417 171.39681444 154.42965029
174.8478415 192.20931673 176.10036866 156.38985278 168.14837529
163.14524581 167.13927347 144.46522745 166.43504583 155.81423396
166.68351177 189.00441432 168.81679544 168.37687773 165.26859917
169.61163803 176.28064952 169.91441305 152.31252707 177.65393137
155.20503724 156.00253148 171.59037863 163.18285108 171.53752882
176.77168094 162.52566536 173.65096248 169.75831402 171.13552086
160.96485053 170.92576163 197.13488939 178.84028126 175.25256691
174.30559673 174.55793648 171.28917593 182.75983705 163.48474339
171.79345064 181.33163353 181.75690243 170.56288915 175.54864891
171.39277086 157.10507882 162.93494288 186.22108735 180.83615147
177.45119341 176.74982051 164.74782882 164.1272427 176.95981347
180.71192368 163.51578627 161.40654864 183.31734345 165.91617314
170.21755489 188.20213837 185.16707843 184.39225039 169.9894885
161.3955984 164.32427469 166.21098961 171.65617603 163.59358437
179.0911516 180.97185572 172.98504063 176.91140312 162.73396879
157.82345464 178.57631953 172.72606044 150.15585485 178.22769764
161.50606571 169.11262122 159.70493474 183.88260754 152.4162303
177.37693533 161.91342926 176.64997265 162.56062519 159.72921782
158.53459525 173.18737201 167.7933642 173.27039627 167.68193019
157.80211657 168.66185172 175.06270681 156.67580503 175.5160176
175.14353986 166.4217466 157.7113916 184.19017845 147.43578244
175.90149325 163.53347714 153.64890281 176.42339122 171.01671226
192.43879604 176.17190372 161.49288883 158.41295009 176.66026256]

The hist() function will read the array and produce a histogram:

# A simple histogram:

import matplotlib.pyplot as plt


import numpy as np

x = np.random.normal(170, 10, 250)

plt.hist(x)
plt.show()

# A simple histogram:

import matplotlib.pyplot as plt


import numpy as np

x = np.random.normal(170, 10, 250)

plt.hist(x,100) # to draw a histogram with 100 bars


plt.show()
Write a Python function that plots a histogram using Matplotlib to visualize data distribution,

where data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4].

import matplotlib.pyplot as plt

def plot_histogram(data, title, x_label, y_label, bins):


plt.figure(figsize=(8, 4))
plt.hist(data, bins=bins, color='b', edgecolor='black', alpha=0.7)
plt.title(title)
plt.xlabel(x_label)
plt.ylabel(y_label)
plt.grid(True)
plt.show()

data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]

plot_histogram(data, "Histogram Example", "X-axis", "Frequency",


bins=4)
Creating Pie Charts

With Pyplot, you can use the pie() function to draw pie charts:

# A simple pie chart:

import matplotlib.pyplot as plt


import numpy as np

y = np.array([35, 25, 25, 15])

plt.pie(y)
plt.show()
As you can see the pie chart draws one piece (called a wedge) for each value in the array (in this
case [35, 25, 25, 15]).

By default the plotting of the first wedge starts from the x-axis and moves counterclockwise:

The value divided by the sum of all values: x/sum(x)

Labels

Add labels to the pie chart with the labels parameter.

The labels parameter must be an array with one label for each wedge:

# A simple pie chart:

import matplotlib.pyplot as plt


import numpy as np

y = np.array([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries", "Dates"]

plt.pie(y, labels = mylabels)


plt.show()
Start Angle

As mentioned the default start angle is at the x-axis, but you can change the start angle by
specifying a startangle parameter.

The startangle parameter is defined with an angle in degrees, default angle is 0:

# Start the first wedge at 90 degrees:

import matplotlib.pyplot as plt


import numpy as np

y = np.array([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries", "Dates"]

plt.pie(y, labels = mylabels, startangle = 90)


plt.show()
Explode

Maybe you want one of the wedges to stand out? The explode parameter allows you to do that.

The explode parameter, if specified, and not None, must be an array with one value for each
wedge.

Each value represents how far from the center each wedge is displayed:

# Pull the "Apples" wedge 0.2 from the center of the pie:

import matplotlib.pyplot as plt


import numpy as np

y = np.array([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
myexplode = [0.2, 0, 0, 0]

plt.pie(y, labels = mylabels, explode = myexplode)


plt.show()
Shadow

Add a shadow to the pie chart by setting the shadows parameter to True:

# Add a shadow:

import matplotlib.pyplot as plt


import numpy as np

y = np.array([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
myexplode = [0.2, 0, 0, 0]

plt.pie(y, labels = mylabels, explode = myexplode, shadow = True)


plt.show()
Legend

To add a list of explanation for each wedge, use the legend() function:

# Add a legend:

import matplotlib.pyplot as plt


import numpy as np

y = np.array([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries", "Dates"]

plt.pie(y, labels = mylabels)


plt.legend()
plt.show()
Legend With Header

To add a header to the legend, add the title parameter to the legend function.

# Add a legend with a header:

import matplotlib.pyplot as plt


import numpy as np

y = np.array([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries", "Dates"]

plt.pie(y, labels = mylabels)


plt.legend(title = "Four Fruits:")
plt.show()

You might also like