Python w3school
Python w3school
Python Tutorial
Python HOME
Python Intro
Python Get Started
Python Syntax
Python Comments
Python Variables
Python Data Types
Python Numbers
Python Casting
Python Strings
Python Booleans
Python Operators
Python Lists
Python Tuples
Python Sets
Python Dictionaries
Python If...Else
Python While Loops
Python For Loops
Python Functions
Python Lambda
Python Arrays
Python Classes/Objects
Python Inheritance
Python Iterators
Python Scope
Python Modules
Python Dates
Python Math
Python JSON
Python RegEx
Python PIP
Python Try...Except
Python User Input
Python String Formatting
File Handling
Python File Handling
Python Read Files
Python Write/Create Files
Python Delete Files
Python Modules
NumPy Tutorial
Pandas Tutorial
SciPy Tutorial
Django Tutorial
Python Matplotlib
Matplotlib Intro
Matplotlib Get Started
Matplotlib Pyplot
Matplotlib Plotting
Matplotlib Markers
Matplotlib Line
Matplotlib Labels
Matplotlib Grid
Matplotlib Subplot
Matplotlib Scatter
Matplotlib Bars
Matplotlib Histograms
Matplotlib Pie Charts
Machine Learning
Getting Started
Mean Median Mode
Standard Deviation
Percentile
Data Distribution
Normal Data Distribution
Scatter Plot
Linear Regression
Polynomial Regression
Multiple Regression
Scale
Train/Test
Decision Tree
Confusion Matrix
Hierarchical Clustering
Logistic Regression
Grid Search
Categorical Data
K-means
Bootstrap Aggregation
Cross Validation
AUC - ROC Curve
K-nearest neighbors
Python MySQL
MySQL Get Started
MySQL Create Database
MySQL Create Table
MySQL Insert
MySQL Select
MySQL Where
MySQL Order By
MySQL Delete
MySQL Drop Table
MySQL Update
MySQL Limit
MySQL Join
Python MongoDB
MongoDB Get Started
MongoDB Create Database
MongoDB Create Collection
MongoDB Insert
MongoDB Find
MongoDB Query
MongoDB Sort
MongoDB Delete
MongoDB Drop Collection
MongoDB Update
MongoDB Limit
Python Reference
Python Overview
Python Built-in Functions
Python String Methods
Python List Methods
Python Dictionary Methods
Python Tuple Methods
Python Set Methods
Python File Methods
Python Keywords
Python Exceptions
Python Glossary
Module Reference
Random Module
Requests Module
Statistics Module
Math Module
cMath Module
Python How To
Remove List Duplicates
Reverse a String
Add Two Numbers
Python Examples
Python Examples
Python Compiler
Python Exercise
Python Introduction
❮ PreviousNext ❯
What is Python?
It is used for:
Why Python?
• Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
• Python has a simple syntax similar to the English language.
• Python has syntax that allows developers to write programs with fewer lines than
some other programming languages.
• Python runs on an interpreter system, meaning that code can be executed as soon
as it is written. This means that prototyping can be very quick.
• Python can be treated in a procedural way, an object-oriented way or a functional
way.
Good to know
• The most recent major version of Python is Python 3, which we shall be using in this
tutorial. However, Python 2, although not being updated with anything other than
security updates, is still quite popular.
• In this tutorial Python will be written in a text editor. It is possible to write Python in
an Integrated Development Environment,
such as Thonny, Pycharm, Netbeans or Eclipse which are particularly useful when
managing larger collections of Python files.
Python Syntax compared to other programming languages
• Python was designed for readability, and has some similarities to the English
language with influence from mathematics.
• Python uses new lines to complete a command, as opposed to other programming
languages which often use semicolons or parentheses.
• Python relies on indentation, using whitespace, to define scope; such as the scope
of loops, functions and classes. Other programming languages often use curly-
brackets for this purpose
Example
print("Hello, World!")
Try it Yourself »
❮ PreviousNext ❯
Python Getting Started
❮ PreviousNext ❯
Python Install
To check if you have python installed on a Windows PC, search in the start bar for Python
or run the following on the Command Line (cmd.exe):
To check if you have python installed on a Linux or Mac, then on linux open the command
line or on Mac open the Terminal and type:
python --version
If you find that you do not have Python installed on your computer, then you can download
it for free from the following website: https://www.python.org/
Python Quickstart
Python is an interpreted programming language, this means that as a developer you write
Python (.py) files in a text editor and then put those files into the python interpreter to be
executed.
The way to run a python file is like this on the command line:
Let's write our first Python file, called helloworld.py, which can be done in any text editor.
helloworld.py
print("Hello, World!")
Try it Yourself »
Simple as that. Save your file. Open your command line, navigate to the directory where
you saved your file, and run:
Hello, World!
Congratulations, you have written and executed your first Python program.
The Python Command Line
To test a short amount of code in python sometimes it is quickest and easiest not to write
the code in a file. This is made possible because Python can be run as a command line
itself.
C:\Users\Your Name>python
Or, if the "python" command did not work, you can try "py":
C:\Users\Your Name>py
From there you can write any python, including our hello world example from earlier in the
tutorial:
C:\Users\Your Name>python
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print("Hello, World!")
C:\Users\Your Name>python
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print("Hello, World!")
Hello, World!
Whenever you are done in the python command line, you can simply type the following to
quit the python command line interface:
exit()
❮ PreviousNext ❯
Python Syntax
❮ PreviousNext ❯
As we learned in the previous page, Python syntax can be executed by writing directly in
the Command Line:
Or by creating a python file on the server, using the .py file extension, and running it in
the Command Line:
C:\Users\Your Name>python myfile.py
Python Indentation
Indentation refers to the spaces at the beginning of a code line.
Example
if 5 > 2:
print("Five is greater than two!")
Try it Yourself »
Example
Syntax Error:
if 5 > 2:
print("Five is greater than two!")
Try it Yourself »
The number of spaces is up to you as a programmer, the most common use is four, but i
Example
if 5 > 2:
print("Five is greater than two!")
if 5 > 2:
print("Five is greater than two!")
Try it Yourself »
You have to use the same number of spaces in the same block of code,
otherwise Python will give you an error
Example
Syntax Error:
if 5 > 2:
print("Five is greater than two!")
print("Five is greater than two!")
Try it Yourself »
Python Variables
In Python, variables are
Example
Variables in Python:
x = 5
y = "Hello, World!"
Try it Yourself »
You will learn more about variables in the Python Variables chapter.
Comments
Python has commenting capability for the purpose of in-code documentation
Comments start with a #, and Python will render the rest of the line as a comment:
Example
Comments in Python:
#This is a comment.
print("Hello, World!")
Try it Yourself »
With Exercises
Exercise:
Insert the missing part of the code below to output "Hello World".
("Hello World")
Submit Answer »
❮ PreviousNext ❯
Creating a Comment
Comments starts with a #, and Python will ignore them:
Example
#This is a comment
print("Hello, World!")
Try it Yourself »
Comments can be placed at the end of a line, and Python will ignore the rest
of the line:
Example
print("Hello, World!") #This is a comment
Try it Yourself »
Example
#print("Hello, World!")
print("Cheers, Mate!")
Try it Yourself »
Example
#This is a comment
#written in
#more than just one line
print("Hello,World!")
Try it Yourself »
Since Python will ignore string literals that are not assigned to a variable, you can add a
multiline string (triple quotes)
Example
"""
This is a comment
written in
more than just one line
"""
print("Hello, World!")
Try it Yourself »
As long as the string is not assigned to a variable, Python will read the code, but then
ignore it, and you have made a multiline comment.
This is a comment
Submit Answer »
Python Variables
❮ PreviousNext ❯
Variables
Variables
Creating Variables
Python has no command for declaring a variable.
Example
x = 5
y = "John"
print(x)
print(y)
Try it Yourself »
, and can even change type after they have been set
.
Example
x = 4 # x is of type int
x = "Sally" # x is now of type str
print(x)
Try it Yourself »
Casting
If you want to specify the data type of a variable
Example
will be 3
z = float(3) # z will be 3.0
Try it Yourself »
Example
x = 5
y = "John"
print(type(x))
print(type(y))
Try it Yourself »
You will learn more about data types and casting later in this tutorial.
Example
x = "John"
# is the same as
x = 'John'
Try it Yourself »
Case-Sensitive
Variable names are case-sensitive.
Example
This will create two variables:
a = 4
A = "Sally"
#A will not overwrite a
Try it Yourself »
❮ PreviousNext ❯
Variable Names
A variable can have a short name (like x and y) or a more descriptive name (age, carname,
total_volume)
. Rules for Python variables:
Example
Legal variable names:
myvar = "John"
my_var = "John"
_my_var = "John"
myVar = "John"
MYVAR = "John"
myvar2 = "John"
Try it Yourself »
Example
Illegal variable names:
2myvar = "John"
my-var = "John"
my var = "John"
Try it Yourself »
Remember that variable names are case-sensitive
Camel Case
Each word, except the first, starts with a capital letter:
myVariable
Name = "John"
Pascal Case
Each word starts with a capital letter:
MyVariableN
ame = "John"
Snake Case
Each word is separated by an underscore character:
my_variable_name = "John"
❮ PreviousNext ❯
Python Variables - Assign Multiple
Values
❮ PreviousNext ❯
Example
x, y, z = "Orange", "Banana", "Cherry"
print(x)
print(y)
print(z)
Try it Yourself »
number of variables matches the number of values, or else you will get an error.
Example
x = y = z = "Orange"
print(x)
print(y)
print(z)
Try it Yourself »
Unpack a Collection
If you have a collection of values in a list, tuple etc
.
Python allows you to extract the values into variables. This is called unpacking
.
Example
Unpack a list:
x, y, z = fruits
print(x)
print(y)
print(z)
Try it Yourself »
❮ PreviousNext ❯
Output Variables
The Python print() function
Example
x = "Python is awesome"
print(x)
Try it Yourself »
Example
x = "Python"
y = "is"
z = "awesome"
print(x, y, z)
Try it Yourself »
to output
multiple variables:
Example
x = "Python "
y = "is "
z = "awesome"
print(x + y + z)
Try it Yourself »
Example
x = 5
y = 10
print(x + y)
Try it Yourself »
with the +
operator, Python will give you an error
:
Example
x = 5
y = "John"
print(x + y)
Try it Yourself »
Example
x = 5
y = "John"
print(x, y)
Try it Yourself »
❮ PreviousNext ❯
Global Variables
Variables that are
created outside of a function (as in all of the examples above) are known as global variables.
Global variables
Example
Create a variable outside of a function, and use it inside the function
x = "awesome"
def myfunc():
print("Python is " + x)
myfunc()
Try it Yourself »
If you create a variable with the same name inside a function, this variable
will
be local, and
can only be used inside the function
. The
global variable
with the same name will remain as it was, global and with the original value
.
Example
Create a variable inside a function, with the same name as the global variable
x = "awesome"
def myfunc():
x = "fantastic"
print("Python is " + x)
myfunc()
print("Python is " + x)
Try it Yourself »
The global Keyword
Normally, when you create a variable inside a function, that variable is
local, and can only be used inside that function.
Example
If you use the global keyword, the variable belongs to the global scope
:
def myfunc():
global x
x = "fantastic"
myfunc()
print("Python is " + x)
Try it Yourself »
Also, use the global keyword if you want to change a global variable inside a function
Example
To change the value of a global variable inside a function, refer to the
variable by using the global keyword:
x = "awesome"
def myfunc():
global x
x = "fantastic"
myfunc()
print("Python is " + x)
Try it Yourself »
❮ PreviousNext
Python Data Types
❮ PreviousNext ❯
Variables can store data of different types, and different types can do different things.
Python has the following data types built-in by default, in these categories:
Example
Print the data type of the variable x:
x = 5
print(type(x))
Try it Yourself »
x = 20 int Try it »
x = 1j complex Try it »
x = 5
print(type(x))
Submit Answer »
❮ PreviousNext ❯
Python Numbers
❮ PreviousNext ❯
Python Numbers
There are three numeric types in Python:
• int
• float
• complex
Variables of numeric types are created when you assign a value to them:
Example
x = 1 # int
y = 2.8 # float
z = 1j # complex
To verify the type of any object in Python, use the type() function:
Example
print(type(x))
print(type(y))
print(type(z))
Try it Yourself »
Int
Int, or integer, is a whole number, positive or negative, without decimals, of unlimited
length.
Example
Integers:
x = 1
y = 35656222554887711
z = -3255522
print(type(x))
print(type(y))
print(type(z))
Try it Yourself »
Float
Float, or "floating point number" is a number, positive or negative, containing one or
more decimals.
Example
Floats:
x = 1.10
y = 1.0
z = -35.59
print(type(x))
print(type(y))
print(type(z))
Try it Yourself »
Float can also be scientific numbers with an "e" to indicate the power of 10.
Example
Floats:
x = 35e3
y = 12E4
z = -87.7e100
print(type(x))
print(type(y))
print(type(z))
Try it Yourself »
Complex
Complex numbers are written with a "j" as the imaginary part:
Example
Complex:
x = 3+5j
y = 5j
z = -5j
print(type(x))
print(type(y))
print(type(z))
Try it Yourself »
Type Conversion
You can convert from one type to another with the int(), float(), and complex() methods:
Example
Convert from one type to another:
x = 1 # int
y = 2.8 # float
z = 1j # complex
print(a)
print(b)
print(c)
print(type(a))
print(type(b))
print(type(c))
Try it Yourself »
Note: You cannot convert complex numbers into another number type.
Random Number
Python does not have a random() function to make a random number, but Python has a
built-in module called random that can be used to make random numbers:
Example
Import the random module, and display a random number between 1 and 9:
import random
print(random.randrange(1, 10))
Try it Yourself »
In our Random Module Reference you will learn more about the Random module.
x = 5
x = (x)
Submit Answer »
❮ PreviousNext ❯
Python Casting
❮ PreviousNext ❯
• int() - constructs an integer number from an integer literal, a float literal (by
removing all decimals), or a string literal (providing the string represents a whole
number)
• float() - constructs a float number from an integer literal, a float literal or a string
literal (providing the string represents a float or an integer)
• str() - constructs a string from a wide variety of data types, including strings,
integer literals and float literals
Example
Integers:
x = int(1) # x will be 1
y = int(2.8) # y will be 2
z = int("3") # z will be 3
Try it Yourself »
Example
Floats:
Try it Yourself »
Example
Strings:
❮ PreviousNext ❯
Strings
Strings in python are surrounded by either single quotation marks, or double quotation
marks.
Example
print("Hello")
print('Hello')
Try it Yourself »
Example
a = "Hello"
print(a)
Try it Yourself »
Multiline Strings
You can assign a multiline string to a variable by using three quotes:
Example
You can use three double quotes:
Example
a = '''Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua.'''
print(a)
Try it Yourself »
Note: in the result, the line breaks are inserted at the same position as in the code.
Strings are Arrays
Like many other popular programming languages, strings in Python are arrays of bytes
representing unicode characters.
However, Python does not have a character data type, a single character is simply a
string with a length of 1.
Example
Get the character at position 1 (remember that the first character has the position 0):
a = "Hello, World!"
print(a[1])
Try it Yourself »
Example
Loop through the letters in the word "banana":
for x in "banana":
print(x)
Try it Yourself »
Learn more about For Loops in our Python For Loops chapter.
String Length
To get the length of a string, use the len() function.
Example
The len() function returns the length of a string:
a = "Hello, World!"
print(len(a))
Try it Yourself »
Check String
To check if a certain phrase or character is present in a string, we can use the
keyword in.
Example
Check if "free" is present in the following text:
Try it Yourself »
Use it in an if statement:
Example
Print only if "free" is present:
Check if NOT
To check if a certain phrase or character is NOT present in a string, we can use the
keyword not in.
Example
Check if "expensive" is NOT present in the following text:
Use it in an if statement:
Example
print only if "expensive" is NOT present:
❮ PreviousNext ❯
Slicing
You can return a range of characters by using the slice syntax.
Specify the start index and the end index, separated by a colon, to return a part of the
string.
Example
Get the characters from position 2 to position 5 (not included):
b = "Hello, World!"
print(b[2:5])
Try it Yourself »
Example
Get the characters from the start to position 5 (not included):
b = "Hello, World!"
print(b[:5])
Try it Yourself »
Example
Get the characters from position 2, and all the way to the end:
b = "Hello, World!"
print(b[2:])
Try it Yourself »
Negative Indexing
Use negative indexes to start the slice from the end of the string:
Example
Get the characters:
b = "Hello, World!"
print(b[-5:-2])
Try it Yourself »
❮ PreviousNext ❯
Python has a set of built-in methods that you can use on strings.
Upper Case
Example
The upper() method returns the string in upper case:
a = "Hello, World!"
print(a.upper())
Try it Yourself »
Lower Case
Example
The lower() method returns the string in lower case:
a = "Hello, World!"
print(a.lower())
Try it Yourself »
Remove Whitespace
Whitespace is the space before and/or after the actual text, and very often you want to
remove this space.
Example
The strip() method removes any whitespace from the beginning or the end:
Replace String
Example
The replace() method replaces a string with another string:
a = "Hello, World!"
print(a.replace("H", "J"))
Try it Yourself »
Split String
The split() method returns a list where the text between the specified separator becomes
the list items.
Example
The split() method splits the string into substrings if it finds instances of the separator:
a = "Hello, World!"
print(a.split(",")) # returns ['Hello', ' World!']
Try it Yourself »
String Methods
Learn more about String Methods with our String Methods Reference
❮ PreviousNext ❯
String Concatenation
To concatenate, or combine, two strings you can use the + operator.
Example
Merge variable a with variable b into variable c:
a = "Hello"
b = "World"
c = a + b
print(c)
Try it Yourself »
Example
To add a space between them, add a " ":
a = "Hello"
b = "World"
c = a + " " + b
print(c)
Try it Yourself »
❮ PreviousNext ❯
String Format
As we learned in the Python Variables chapter, we cannot combine strings and numbers
like this:
Example
age = 36
txt = "My name is John, I am " + age
print(txt)
Try it Yourself »
But we can combine strings and numbers by using the format() method!
The format() method takes the passed arguments, formats them, and places them in the
string where the placeholders {} are:
Example
Use the format() method to insert numbers into strings:
age = 36
txt = "My name is John, and I am {}"
print(txt.format(age))
Try it Yourself »
The format() method takes unlimited number of arguments, and are placed into the
respective placeholders:
Example
quantity = 3
itemno = 567
price = 49.95
myorder = "I want {} pieces of item {} for {} dollars."
print(myorder.format(quantity, itemno, price))
Try it Yourself »
You can use index numbers {0} to be sure the arguments are placed in the correct
placeholders:
Example
quantity = 3
itemno = 567
price = 49.95
myorder = "I want to pay {2} dollars for {0} pieces of item {1}."
print(myorder.format(quantity, itemno, price))
Try it Yourself »
❮ PreviousNext ❯
Escape Character
To insert characters that are illegal in a string, use an escape character.
Example
You will get an error if you use double quotes inside a string that is surrounded by double
quotes:
Example
The escape character allows you to use double quotes when you normally would not be
allowed:
txt = "We are the so-called \"Vikings\" from the north."
Try it Yourself »
Escape Characters
Other escape characters used in Python:
\\ Backslash Try it »
\t Tab Try it »
\b Backspace Try it »
\f Form Feed
String Methods
Python has a set of built-in methods that you can use on strings.
Note: All string methods return new values. They do not change the original string.
Method Description
❮ PreviousNext ❯
Try to insert the missing part to make the code work as expected:
x = "Hello World"
print( )
Submit Answer »
Go to the Exercise section and test all of our Python Strings Exercises:
❮ PreviousNext ❯
You can evaluate any expression in Python, and get one of two answers, True or False.
When you compare two values, the expression is evaluated and Python returns the
Boolean answer:
Example
print(10 > 9)
print(10 == 9)
print(10 < 9)
Try it Yourself »
Example
Print a message based on whether the condition is True or False:
a = 200
b = 33
if b > a:
print("b is greater than a")
else:
print("b is not greater than a")
Try it Yourself »
Example
Evaluate a string and a number:
print(bool("Hello"))
print(bool(15))
Try it Yourself »
Example
Evaluate two variables:
x = "Hello"
y = 15
print(bool(x))
print(bool(y))
Try it Yourself »
Any list, tuple, set, and dictionary are True, except empty ones.
Example
The following will return True:
bool("abc")
bool(123)
bool(["apple", "cherry", "banana"])
Try it Yourself »
Example
The following will return False:
bool(False)
bool(None)
bool(0)
bool("")
bool(())
bool([])
bool({})
Try it Yourself »
One more value, or object in this case, evaluates to False, and that is if you have an
object that is made from a class with a __len__ function that returns 0 or False:
Example
class myclass():
def __len__(self):
return 0
myobj = myclass()
print(bool(myobj))
Try it Yourself »
Example
Print the answer of a function:
def myFunction() :
return True
print(myFunction())
Try it Yourself »
Example
Print "YES!" if the function returns True, otherwise print "NO!":
def myFunction() :
return True
if myFunction():
print("YES!")
else:
print("NO!")
Try it Yourself »
Python also has many built-in functions that return a boolean value, like
the isinstance() function, which can be used to determine if an object is of a certain data
type:
Example
Check if an object is an integer or not:
x = 200
print(isinstance(x, int))
Try it Yourself »
print(10 > 9)
Submit Answer »
❮ PreviousNext ❯
Python Operators
Operators are used to perform operations on variables and values.
In the example below, we use the + operator to add together two values:
Example
print(10 + 5)
Run example »
• Arithmetic operators
• Assignment operators
• Comparison operators
• Logical operators
• Identity operators
• Membership operators
• Bitwise operators
Python Arithmetic Operators
Arithmetic operators are used with numeric values to perform common mathematical
operations:
** Exponentiation x ** y Try it »
+= x += 3 x=x+3 Try it »
-= x -= 3 x=x-3 Try it »
*= x *= 3 x=x*3 Try it »
/= x /= 3 x=x/3 Try it »
%= x %= 3 x=x%3 Try it »
|= x |= 3 x=x|3 Try it »
^= x ^= 3 x=x^3 Try it »
== Equal x == y Try it »
not Reverse the result, returns False if not(x < 5 and x Try it »
the result is true < 10)
<< Zero fill left Shift left by pushing zeros in from the right and let the
shift leftmost bits fall off
print(10 5)
Submit Answer »
❮ PreviousNext ❯
Python Lists
mylist = ["apple", "banana", "cherry"]
List
Lists are used to store multiple items in a single variable.
Lists are one of 4 built-in data types in Python used to store collections of data, the other
3 are Tuple, Set, and Dictionary, all with different qualities and usage.
Example
Create a List:
List Items
List items are ordered, changeable, and allow duplicate values.
List items are indexed, the first item has index [0], the second item has index [1] etc.
Ordered
When we say that lists are ordered, it means that the items have a defined order, and
that order will not change.
If you add new items to a list, the new items will be placed at the end of the list.
Note: There are some list methods that will change the order, but in general: the order
of the items will not change.
Changeable
The list is changeable, meaning that we can change, add, and remove items in a list after
it has been created.
Allow Duplicates
Since lists are indexed, lists can have items with the same value:
Example
Lists allow duplicate values:
List Length
To determine how many items a list has, use the len() function:
Example
Print the number of items in the list:
Example
String, int and boolean data types:
Example
A list with strings, integers and boolean values:
type()
From Python's perspective, lists are defined as objects with the data type 'list':
<class 'list'>
Example
What is the data type of a list?
Example
Using the list() constructor to make a List:
*Set items are unchangeable, but you can remove and/or add items whenever you like.
**As of Python version 3.7, dictionaries are ordered. In Python 3.6 and earlier,
dictionaries are unordered.
When choosing a collection type, it is useful to understand the properties of that type.
Choosing the right type for a particular data set could mean retention of meaning, and, it
could mean an increase in efficiency or security.
❮ PreviousNext ❯
Access Items
List items are indexed and you can access them by referring to the index number:
Example
Print the second item of the list:
Negative Indexing
Negative indexing means start from the end
-1 refers to the last item, -2 refers to the second last item etc.
Example
Print the last item of the list:
Range of Indexes
You can specify a range of indexes by specifying where to start and where to end the
range.
When specifying a range, the return value will be a new list with the specified items.
Example
Return the third, fourth, and fifth item:
Note: The search will start at index 2 (included) and end at index 5 (not included).
By leaving out the start value, the range will start at the first item:
Example
This example returns the items from the beginning to, but NOT including, "kiwi":
Example
This example returns the items from "cherry" to the end:
Example
This example returns the items from "orange" (-4) to, but NOT including "mango" (-1):
Example
Check if "apple" is present in the list:
❮ PreviousNext ❯
Change Item Value
To change the value of a specific item, refer to the index number:
Example
Change the second item:
Example
Change the values "banana" and "cherry" with the values "blackcurrant" and
"watermelon":
If you insert more items than you replace, the new items will be inserted where you
specified, and the remaining items will move accordingly:
Example
Change the second value by replacing it with two new values:
Note: The length of the list will change when the number of items inserted does not
match the number of items replaced.
If you insert less items than you replace, the new items will be inserted where you
specified, and the remaining items will move accordingly:
Example
Change the second and third value by replacing it with one value:
Insert Items
To insert a new list item, without replacing any of the existing values, we can use
the insert() method.
Example
Insert "watermelon" as the third item:
Note: As a result of the example above, the list will now contain 4 items.
❮ PreviousNext ❯
Append Items
To add an item to the end of the list, use the append() method:
Example
Using the append() method to append an item:
Example
Insert an item as the second position:
Note: As a result of the examples above, the lists will now contain 4 items.
Extend List
To append elements from another list to the current list, use the extend() method.
Example
Add the elements of tropical to thislist:
Example
Add elements of a tuple to a list:
thislist = ["apple", "banana", "cherry"]
thistuple = ("kiwi", "orange")
thislist.extend(thistuple)
print(thislist)
Try it Yourself »
❮ PreviousNext ❯
Example
Remove "banana":
Example
Remove the second item:
If you do not specify the index, the pop() method removes the last item.
Example
Remove the last item:
Example
Remove the first item:
Example
Delete the entire list:
Example
Clear the list content:
❮ PreviousNext ❯
Loop Through a List
You can loop through the list items by using a for loop:
Example
Print all items in the list, one by one:
Learn more about for loops in our Python For Loops Chapter.
Example
Print all items by referring to their index number:
Use the len() function to determine the length of the list, then start at 0 and loop your
way through the list items by referring to their indexes.
Learn more about while loops in our Python While Loops Chapter.
Example
A short hand for loop that will print all items in a list:
Learn more about list comprehension in the next chapter: List Comprehension.
❮ PreviousNext ❯
List Comprehension
List comprehension offers a shorter syntax when you want to create a new list based on
the values of an existing list.
Example:
Based on a list of fruits, you want a new list, containing only the fruits with the letter "a"
in the name.
Without list comprehension you will have to write a for statement with a conditional test
inside:
Example
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
newlist = []
for x in fruits:
if "a" in x:
newlist.append(x)
print(newlist)
Try it Yourself »
With list comprehension you can do all that with only one line of code:
Example
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
print(newlist)
Try it Yourself »
The Syntax
newlist = [expression for item in iterable if condition == True]
The return value is a new list, leaving the old list unchanged.
Condition
The condition is like a filter that only accepts the items that valuate to True.
Example
Only accept items that are not "apple":
The condition if x != "apple" will return True for all elements other than "apple",
making the new list contain all fruits except "apple".
Iterable
The iterable can be any iterable object, like a list, tuple, set etc.
Example
You can use the range() function to create an iterable:
Example
Accept only numbers lower than 5:
Expression
The expression is the current item in the iteration, but it is also the outcome, which you
can manipulate before it ends up like a list item in the new list:
Example
Set the values in the new list to upper case:
Example
Set all values in the new list to 'hello':
The expression can also contain conditions, not like a filter, but as a way to manipulate
the outcome:
Example
Return "orange" instead of "banana":
❮ PreviousNext ❯
Example
Sort the list alphabetically:
Example
Sort the list numerically:
Sort Descending
To sort descending, use the keyword argument reverse = True:
Example
Sort the list descending:
Example
Sort the list descending:
The function will return a number that will be used to sort the list (the lowest number
first):
Example
Sort the list based on how close the number is to 50:
def myfunc(n):
return abs(n - 50)
Example
Case sensitive sorting can give an unexpected result:
Luckily we can use built-in functions as key functions when sorting a list.
Example
Perform a case-insensitive sort of the list:
Reverse Order
What if you want to reverse the order of a list, regardless of the alphabet?
The reverse() method reverses the current sorting order of the elements.
Example
Reverse the order of the list items:
❮ PreviousNext ❯
Python - Copy Lists
❮ PreviousNext ❯
Copy a List
You cannot copy a list simply by typing list2 = list1, because: list2 will only be
a reference to list1, and changes made in list1 will automatically also be made in list2.
There are ways to make a copy, one way is to use the built-in List method copy().
Example
Make a copy of a list with the copy() method:
Example
Make a copy of a list with the list() method:
❮ PreviousNext ❯
Example
Join two list:
Another way to join two lists is by appending all the items from list2 into list1, one by
one:
Example
Append list2 into list1:
for x in list2:
list1.append(x)
print(list1)
Try it Yourself »
Or you can use the extend() method, which purpose is to add elements from one list to
another list:
Example
Use the extend() method to add list2 at the end of list1:
list1.extend(list2)
print(list1)
Try it Yourself »
❮ PreviousNext ❯
Python - List Methods
❮ PreviousNext ❯
List Methods
Python has a set of built-in methods that you can use on lists.
Method Description
extend() Add the elements of a list (or any iterable), to the end of
the current list
index() Returns the index of the first element with the specified
value
Try to insert the missing part to make the code work as expected:
Exercise:
Print the second item in the fruits list.
Submit Answer »
Go to the Exercise section and test all of our Python List Exercises:
❮ PreviousNext ❯
Python Tuples
❮ PreviousNext ❯
Tuple
Tuples are used to store multiple items in a single variable.
Tuple is one of 4 built-in data types in Python used to store collections of data, the other
3 are List, Set, and Dictionary, all with different qualities and usage.
Example
Create a Tuple:
Tuple Items
Tuple items are ordered, unchangeable, and allow duplicate values.
Tuple items are indexed, the first item has index [0], the second item has index [1] etc.
Ordered
When we say that tuples are ordered, it means that the items have a defined order, and
that order will not change.
Unchangeable
Tuples are unchangeable, meaning that we cannot change, add or remove items after the
tuple has been created.
Allow Duplicates
Since tuples are indexed, they can have items with the same value:
Example
Tuples allow duplicate values:
Tuple Length
To determine how many items a tuple has, use the len() function:
Example
Print the number of items in the tuple:
Example
One item tuple, remember the comma:
thistuple = ("apple",)
print(type(thistuple))
#NOT a tuple
thistuple = ("apple")
print(type(thistuple))
Try it Yourself »
Example
String, int and boolean data types:
Example
A tuple with strings, integers and boolean values:
type()
From Python's perspective, tuples are defined as objects with the data type 'tuple':
<class 'tuple'>
Example
What is the data type of a tuple?
Example
Using the tuple() method to make a tuple:
*Set items are unchangeable, but you can remove and/or add items whenever you like.
**As of Python version 3.7, dictionaries are ordered. In Python 3.6 and earlier,
dictionaries are unordered.
When choosing a collection type, it is useful to understand the properties of that type.
Choosing the right type for a particular data set could mean retention of meaning, and, it
could mean an increase in efficiency or security.
❮ PreviousNext ❯
Example
Print the second item in the tuple:
Negative Indexing
Negative indexing means start from the end.
-1 refers to the last item, -2 refers to the second last item etc.
Example
Print the last item of the tuple:
Range of Indexes
You can specify a range of indexes by specifying where to start and where to end the range.
When specifying a range, the return value will be a new tuple with the specified items.
Example
Return the third, fourth, and fifth item:
Note: The search will start at index 2 (included) and end at index 5 (not included).
By leaving out the start value, the range will start at the first item:
Example
This example returns the items from the beginning to, but NOT included, "kiwi":
By leaving out the end value, the range will go on to the end of the list:
Example
This example returns the items from "cherry" and to the end:
Example
This example returns the items from index -4 (included) to index -1 (excluded)
Example
Check if "apple" is present in the tuple:
Tuples are unchangeable, meaning that you cannot change, add, or remove items once
the tuple is created.
But there is a workaround. You can convert the tuple into a list, change the list, and
convert the list back into a tuple.
Example
Convert the tuple into a list to be able to change it:
print(x)
Try it Yourself »
Add Items
Since tuples are immutable, they do not have a build-in append() method, but there are
other ways to add items to a tuple.
1. Convert into a list: Just like the workaround for changing a tuple, you can convert it
into a list, add your item(s), and convert it back into a tuple.
Example
Convert the tuple into a list, add "orange", and convert it back into a tuple:
Try it Yourself »
2. Add tuple to a tuple. You are allowed to add tuples to tuples, so if you want to add
one item, (or many), create a new tuple with the item(s), and add it to the existing tuple:
Example
Create a new tuple with the value "orange", and add that tuple:
print(thistuple)
Try it Yourself »
Note: When creating a tuple with only one item, remember to include a comma after the
item, otherwise it will not be identified as a tuple.
Remove Items
Note: You cannot remove items in a tuple.
Tuples are unchangeable, so you cannot remove items from it, but you can use the
same workaround as we used for changing and adding tuple items:
Example
Convert the tuple into a list, remove "apple", and convert it back into a tuple:
Try it Yourself »
Or you can delete the tuple completely:
Example
The del keyword can delete the tuple completely:
❮ PreviousNext ❯
Unpacking a Tuple
When we create a tuple, we normally assign values to it. This is called "packing" a tuple:
Example
Packing a tuple:
Try it Yourself »
But, in Python, we are also allowed to extract the values back into variables. This is called
"unpacking":
Example
Unpacking a tuple:
print(green)
print(yellow)
print(red)
Try it Yourself »
Note: The number of variables must match the number of values in the tuple, if not, you
must use an asterisk to collect the remaining values as a list.
Using Asterisk*
If the number of variables is less than the number of values, you can add an * to the
variable name and the values will be assigned to the variable as a list:
Example
Assign the rest of the values as a list called "red":
print(green)
print(yellow)
print(red)
Try it Yourself »
If the asterisk is added to another variable name than the last, Python will assign values
to the variable until the number of values left matches the number of variables left.
Example
Add a list of values the "tropic" variable:
print(green)
print(tropic)
print(red)
Try it Yourself »
❮ PreviousNext ❯
Python - Loop Tuples
❮ PreviousNext ❯
Example
Iterate through the items and print the values:
Learn more about for loops in our Python For Loops Chapter.
Example
Print all items by referring to their index number:
Use the len() function to determine the length of the tuple, then start at 0 and loop your
way through the tuple items by refering to their indexes.
Example
Print all items, using a while loop to go through all the index numbers:
Learn more about while loops in our Python While Loops Chapter.
❮ PreviousNext ❯
Example
Join two tuples:
Example
Multiply the fruits tuple by 2:
print(mytuple)
Try it Yourself »
❮ PreviousNext ❯
Tuple Methods
Python has two built-in methods that you can use on tuples.
Method Description
index() Searches the tuple for a specified value and returns the
position of where it was found
❮ PreviousNext ❯
Try to insert the missing part to make the code work as expected:
Exercise:
Print the first item in the fruits tuple.
Submit Answer »
Go to the Exercise section and test all of our Python Tuple Exercises:
❮ PreviousNext ❯
Python Sets
❮ PreviousNext ❯
Set
Sets are used to store multiple items in a single variable.
Set is one of 4 built-in data types in Python used to store collections of data, the other 3
are List, Tuple, and Dictionary, all with different qualities and usage.
* Note: Set items are unchangeable, but you can remove items and add new items.
Example
Create a Set:
Note: Sets are unordered, so you cannot be sure in which order the items will appear.
Set Items
Set items are unordered, unchangeable, and do not allow duplicate values.
Unordered
Unordered means that the items in a set do not have a defined order.
Set items can appear in a different order every time you use them, and cannot be
referred to by index or key.
Unchangeable
Set items are unchangeable, meaning that we cannot change the items after the set has
been created.
Once a set is created, you cannot change its items, but you can remove items and add
new items.
Example
Duplicate values will be ignored:
print(thisset)
Try it Yourself »
Example
Get the number of items in a set:
print(len(thisset))
Try it Yourself »
Example
String, int and boolean data types:
Example
A set with strings, integers and boolean values:
type()
From Python's perspective, sets are defined as objects with the data type 'set':
<class 'set'>
Example
What is the data type of a set?
Example
Using the set() constructor to make a set:
*Set items are unchangeable, but you can remove items and add new items.
**As of Python version 3.7, dictionaries are ordered. In Python 3.6 and earlier,
dictionaries are unordered.
When choosing a collection type, it is useful to understand the properties of that type.
Choosing the right type for a particular data set could mean retention of meaning, and, it
could mean an increase in efficiency or security.
❮ PreviousNext ❯
Access Items
You cannot access items in a set by referring to an index or a key.
But you can loop through the set items using a for loop, or ask if a specified value is
present in a set, by using the in keyword.
Example
Loop through the set, and print the values:
thisset = {"apple", "banana", "cherry"}
for x in thisset:
print(x)
Try it Yourself »
Example
Check if "banana" is present in the set:
print("banana" in thisset)
Try it Yourself »
Change Items
Once a set is created, you cannot change its items, but you can add new items.
❮ PreviousNext ❯
Add Items
Once a set is created, you cannot change its items, but you can add new items.
Example
Add an item to a set, using the add() method:
thisset.add("orange")
print(thisset)
Try it Yourself »
Add Sets
To add items from another set into the current set, use the update() method.
Example
Add elements from tropical into thisset:
thisset.update(tropical)
print(thisset)
Try it Yourself »
Example
Add elements of a list to at set:
thisset.update(mylist)
print(thisset)
Try it Yourself »
❮ PreviousNext ❯
Python - Remove Set Items
❮ PreviousNext ❯
Remove Item
To remove an item in a set, use the remove(), or the discard() method.
Example
Remove "banana" by using the remove() method:
thisset.remove("banana")
print(thisset)
Try it Yourself »
Note: If the item to remove does not exist, remove() will raise an error.
Example
Remove "banana" by using the discard() method:
thisset.discard("banana")
print(thisset)
Try it Yourself »
Note: If the item to remove does not exist, discard() will NOT raise an error.
You can also use the pop() method to remove an item, but this method will remove
the last item. Remember that sets are unordered, so you will not know what item that
gets removed.
Example
Remove the last item by using the pop() method:
x = thisset.pop()
print(x)
print(thisset)
Try it Yourself »
Note: Sets are unordered, so when using the pop() method, you do not know which item
that gets removed.
Example
The clear() method empties the set:
thisset.clear()
print(thisset)
Try it Yourself »
Example
The del keyword will delete the set completely:
del thisset
print(thisset)
Try it Yourself »
Loop Items
You can loop through the set items by using a for loop:
Example
Loop through the set, and print the values:
❮ PreviousNext ❯
You can use the union() method that returns a new set containing all items from both
sets, or the update() method that inserts all the items from one set into another:
Example
The union() method returns a new set with all items from both sets:
set3 = set1.union(set2)
print(set3)
Try it Yourself »
Example
The update() method inserts the items in set2 into set1:
set1.update(set2)
print(set1)
Try it Yourself »
Note: Both union() and update() will exclude any duplicate items.
Keep ONLY the Duplicates
The intersection_update() method will keep only the items that are present in both sets.
Example
Keep the items that exist in both set x, and set y:
x.intersection_update(y)
print(x)
Try it Yourself »
The intersection() method will return a new set, that only contains the items that are
present in both sets.
Example
Return a set that contains the items that exist in both set x, and set y:
z = x.intersection(y)
print(z)
Try it Yourself »
Example
Keep the items that are not present in both sets:
x.symmetric_difference_update(y)
print(x)
Try it Yourself »
The symmetric_difference() method will return a new set, that contains only the elements
that are NOT present in both sets.
Example
Return a set that contains all items from both sets, except items that are present in both:
z = x.symmetric_difference(y)
print(z)
Try it Yourself »
❮ PreviousNext ❯
us
Set Methods
Python has a set of built-in methods that you can use on sets.
Method Description
❮ PreviousNext ❯
Try to insert the missing part to make the code work as expected:
Exercise:
Check if "apple" is present in the fruits set.
Submit Answer »
Go to the Exercise section and test all of our Python Set Exercises:
❮ PreviousNext ❯
Python Dictionaries
❮ PreviousNext ❯
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
Dictionary
Dictionaries are used to store data values in key:value pairs.
As of Python version 3.7, dictionaries are ordered. In Python 3.6 and earlier, dictionaries
are unordered.
Dictionaries are written with curly brackets, and have keys and values:
Example
Create and print a dictionary:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
print(thisdict)
Try it Yourself »
Dictionary Items
Dictionary items are ordered, changeable, and does not allow duplicates.
Dictionary items are presented in key:value pairs, and can be referred to by using the
key name.
Example
Print the "brand" value of the dictionary:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
print(thisdict["brand"])
Try it Yourself »
Ordered or Unordered?
As of Python version 3.7, dictionaries are ordered. In Python 3.6 and earlier, dictionaries
are unordered.
When we say that dictionaries are ordered, it means that the items have a defined order,
and that order will not change.
Unordered means that the items does not have a defined order, you cannot refer to an
item by using an index.
Changeable
Dictionaries are changeable, meaning that we can change, add or remove items after the
dictionary has been created.
Example
Duplicate values will overwrite existing values:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964,
"year": 2020
}
print(thisdict)
Try it Yourself »
Dictionary Length
To determine how many items a dictionary has, use the len() function:
Example
Print the number of items in the dictionary:
print(len(thisdict))
Try it Yourself »
Example
String, int, boolean, and list data types:
thisdict = {
"brand": "Ford",
"electric": False,
"year": 1964,
"colors": ["red", "white", "blue"]
}
Try it Yourself »
type()
From Python's perspective, dictionaries are defined as objects with the data type 'dict':
<class 'dict'>
Example
Print the data type of a dictionary:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
print(type(thisdict))
Try it Yourself »
*Set items are unchangeable, but you can remove and/or add items whenever you like.
**As of Python version 3.7, dictionaries are ordered. In Python 3.6 and earlier,
dictionaries are unordered.
When choosing a collection type, it is useful to understand the properties of that type.
Choosing the right type for a particular data set could mean retention of meaning, and, it
could mean an increase in efficiency or security.
❮ PreviousNext ❯
Python - Access Dictionary Items
❮ PreviousNext ❯
Accessing Items
You can access the items of a dictionary by referring to its key name, inside square
brackets:
Example
Get the value of the "model" key:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
x = thisdict["model"]
Try it Yourself »
There is also a method called get() that will give you the same result:
Example
Get the value of the "model" key:
x = thisdict.get("model")
Try it Yourself »
Get Keys
The keys() method will return a list of all the keys in the dictionary.
Example
Get a list of the keys:
x = thisdict.keys()
Try it Yourself »
The list of the keys is a view of the dictionary, meaning that any changes done to the
dictionary will be reflected in the keys list.
Example
Add a new item to the original dictionary, and see that the keys list gets updated as well:
car = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
x = car.keys()
car["color"] = "white"
Get Values
The values() method will return a list of all the values in the dictionary.
Example
Get a list of the values:
x = thisdict.values()
Try it Yourself »
The list of the values is a view of the dictionary, meaning that any changes done to the
dictionary will be reflected in the values list.
Example
Make a change in the original dictionary, and see that the values list gets updated as
well:
car = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
x = car.values()
car["year"] = 2020
Example
Add a new item to the original dictionary, and see that the values list gets updated as
well:
car = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
x = car.values()
car["color"] = "red"
Get Items
The items() method will return each item in a dictionary, as tuples in a list.
Example
Get a list of the key:value pairs
x = thisdict.items()
Try it Yourself »
The returned list is a view of the items of the dictionary, meaning that any changes done
to the dictionary will be reflected in the items list.
Example
Make a change in the original dictionary, and see that the items list gets updated as well:
car = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
x = car.items()
car["year"] = 2020
Example
Add a new item to the original dictionary, and see that the items list gets updated as
well:
car = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
x = car.items()
car["color"] = "red"
Example
Check if "model" is present in the dictionary:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
if "model" in thisdict:
print("Yes, 'model' is one of the keys in the thisdict dictionary")
Try it Yourself »
❮ PreviousNext ❯
Change Values
You can change the value of a specific item by referring to its key name:
Example
Change the "year" to 2018:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict["year"] = 2018
Try it Yourself »
Update Dictionary
The update() method will update the dictionary with the items from the given argument.
Example
Update the "year" of the car by using the update() method:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict.update({"year": 2020})
Try it Yourself »
❮ PreviousNext ❯
Adding Items
Adding an item to the dictionary is done by using a new index key and assigning a value
to it:
Example
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict["color"] = "red"
print(thisdict)
Try it Yourself »
Update Dictionary
The update() method will update the dictionary with the items from a given argument. If
the item does not exist, the item will be added.
Example
Add a color item to the dictionary by using the update() method:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict.update({"color": "red"})
Try it Yourself »
❮ PreviousNext ❯
Removing Items
There are several methods to remove items from a dictionary:
Example
The pop() method removes the item with the specified key name:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict.pop("model")
print(thisdict)
Try it Yourself »
Example
The popitem() method removes the last inserted item (in versions before 3.7, a random
item is removed instead):
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict.popitem()
print(thisdict)
Try it Yourself »
Example
The del keyword removes the item with the specified key name:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
del thisdict["model"]
print(thisdict)
Try it Yourself »
Example
The del keyword can also delete the dictionary completely:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
del thisdict
print(thisdict) #this will cause an error because "thisdict" no longer exists.
Try it Yourself »
Example
The clear() method empties the dictionary:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict.clear()
print(thisdict)
Try it Yourself »
❮ PreviousNext ❯
When looping through a dictionary, the return value are the keys of the dictionary, but
there are methods to return the values as well.
Example
Print all key names in the dictionary, one by one:
for x in thisdict:
print(x)
Try it Yourself »
Example
Print all values in the dictionary, one by one:
for x in thisdict:
print(thisdict[x])
Try it Yourself »
Example
You can also use the values() method to return values of a dictionary:
for x in thisdict.values():
print(x)
Try it Yourself »
Example
You can use the keys() method to return the keys of a dictionary:
for x in thisdict.keys():
print(x)
Try it Yourself »
Example
Loop through both keys and values, by using the items() method:
for x, y in thisdict.items():
print(x, y)
Try it Yourself »
❮ PreviousNext ❯
Copy a Dictionary
You cannot copy a dictionary simply by typing dict2 = dict1, because: dict2 will only be
a reference to dict1, and changes made in dict1 will automatically also be made in dict2.
There are ways to make a copy, one way is to use the built-in Dictionary method copy().
Example
Make a copy of a dictionary with the copy() method:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
mydict = thisdict.copy()
print(mydict)
Try it Yourself »
Example
Make a copy of a dictionary with the dict() function:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
mydict = dict(thisdict)
print(mydict)
Try it Yourself »
❮ PreviousNext ❯
Nested Dictionaries
A dictionary can contain dictionaries, this is called nested dictionaries.
Example
Create a dictionary that contain three dictionaries:
myfamily = {
"child1" : {
"name" : "Emil",
"year" : 2004
},
"child2" : {
"name" : "Tobias",
"year" : 2007
},
"child3" : {
"name" : "Linus",
"year" : 2011
}
}
Try it Yourself »
Example
Create three dictionaries, then create one dictionary that will contain the other three
dictionaries:
child1 = {
"name" : "Emil",
"year" : 2004
}
child2 = {
"name" : "Tobias",
"year" : 2007
}
child3 = {
"name" : "Linus",
"year" : 2011
}
myfamily = {
"child1" : child1,
"child2" : child2,
"child3" : child3
}
Try it Yourself »
❮ PreviousNext ❯
Python Dictionary Methods
❮ PreviousNext ❯
Dictionary Methods
Python has a set of built-in methods that you can use on dictionaries.
Method Description
❮ PreviousNext ❯
Try to insert the missing part to make the code work as expected:
car = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
print( )
Submit Answer »
Go to the Exercise section and test all of our Python Dictionary Exercises:
❮ PreviousNext ❯
• Equals: a == b
• Not Equals: a != b
• Less than: a < b
• Less than or equal to: a <= b
• Greater than: a > b
• Greater than or equal to: a >= b
These conditions can be used in several ways, most commonly in "if statements" and
loops.
Example
If statement:
a = 33
b = 200
if b > a:
print("b is greater than a")
Try it Yourself »
In this example we use two variables, a and b, which are used as part of the if statement
to test whether b is greater than a. As a is 33, and b is 200, we know that 200 is greater
than 33, and so we print to screen that "b is greater than a".
Indentation
Python relies on indentation (whitespace at the beginning of a line) to define scope in the
code. Other programming languages often use curly-brackets for this purpose.
Example
If statement, without indentation (will raise an error):
a = 33
b = 200
if b > a:
print("b is greater than a") # you will get an error
Try it Yourself »
Elif
The elif keyword is pythons way of saying "if the previous conditions were not true, then
try this condition".
Example
a = 33
b = 33
if b > a:
print("b is greater than a")
elif a == b:
print("a and b are equal")
Try it Yourself »
In this example a is equal to b, so the first condition is not true, but the elif condition is
true, so we print to screen that "a and b are equal".
Else
The else keyword catches anything which isn't caught by the preceding conditions.
Example
a = 200
b = 33
if b > a:
print("b is greater than a")
elif a == b:
print("a and b are equal")
else:
print("a is greater than b")
Try it Yourself »
In this example a is greater than b, so the first condition is not true, also
the elif condition is not true, so we go to the else condition and print to screen that "a
is greater than b".
Example
a = 200
b = 33
if b > a:
print("b is greater than a")
else:
print("b is not greater than a")
Try it Yourself »
Short Hand If
If you have only one statement to execute, you can put it on the same line as the if
statement.
Example
One line if statement:
a = 2
b = 330
print("A") if a > b else print("B")
Try it Yourself »
You can also have multiple else statements on the same line:
Example
One line if else statement, with 3 conditions:
a = 330
b = 330
print("A") if a > b else print("=") if a == b else print("B")
Try it Yourself »
And
The and keyword is a logical operator, and is used to combine conditional statements:
Example
Test if a is greater than b, AND if c is greater than a:
a = 200
b = 33
c = 500
if a > b and c > a:
print("Both conditions are True")
Try it Yourself »
Or
The or keyword is a logical operator, and is used to combine conditional statements:
Example
Test if a is greater than b, OR if a is greater than c:
a = 200
b = 33
c = 500
if a > b or a > c:
print("At least one of the conditions is True")
Try it Yourself »
Nested If
You can have if statements inside if statements, this is called nested if statements.
Example
x = 41
if x > 10:
print("Above ten,")
if x > 20:
print("and also above 20!")
else:
print("but not above 20.")
Try it Yourself »
Example
a = 33
b = 200
if b > a:
pass
Try it Yourself »
Submit Answer »
❮ PreviousNext ❯
Python Loops
Python has two primitive loop commands:
• while loops
• for loops
Example
Print i as long as i is less than 6:
i = 1
while i < 6:
print(i)
i += 1
Try it Yourself »
Example
Exit the loop when i is 3:
i = 1
while i < 6:
print(i)
if i == 3:
break
i += 1
Try it Yourself »
Example
Continue to the next iteration if i is 3:
i = 0
while i < 6:
i += 1
if i == 3:
continue
print(i)
Try it Yourself »
Example
Print a message once the condition is false:
i = 1
while i < 6:
print(i)
i += 1
else:
print("i is no longer less than 6")
Try it Yourself »
i = 1
i < 6
print(i)
i += 1
Submit Answer »
❮ PreviousNext ❯
Python For Loops
❮ PreviousNext ❯
This is less like the for keyword in other programming languages, and works more like
an iterator method as found in other object-orientated programming languages.
With the for loop we can execute a set of statements, once for each item in a list, tuple,
set etc.
Example
Print each fruit in a fruit list:
The for loop does not require an indexing variable to set beforehand.
Example
Loop through the letters in the word "banana":
for x in "banana":
print(x)
Try it Yourself »
Example
Exit the loop when x is "banana":
Try it Yourself »
Example
Exit the loop when x is "banana", but this time the break comes before the print:
Example
Do not print banana:
Example
Using the range() function:
for x in range(6):
print(x)
Try it Yourself »
Example
Using the start parameter:
Example
Increment the sequence with 3 (default is 1):
Example
Print all numbers from 0 to 5, and print a message when the loop has ended:
for x in range(6):
print(x)
else:
print("Finally finished!")
Try it Yourself »
Note: The else block will NOT be executed if the loop is stopped by a break statement.
Example
Break the loop when x is 3, and see what happens with the else block:
for x in range(6):
if x == 3: break
print(x)
else:
print("Finally finished!")
Try it Yourself »
Nested Loops
A nested loop is a loop inside a loop.
The "inner loop" will be executed one time for each iteration of the "outer loop":
Example
Print each adjective for every fruit:
for x in adj:
for y in fruits:
print(x, y)
Try it Yourself »
Example
for x in [0, 1, 2]:
pass
Try it Yourself »
Test Yourself With Exercises
Exercise:
Loop through the items in the fruits list.
Submit Answer »
❮ PreviousNext ❯
Python Functions
❮ PreviousNext ❯
Creating a Function
In Python a function is defined using the def keyword:
Example
def my_function():
print("Hello from a function")
Calling a Function
To call a function, use the function name followed by parenthesis:
Example
def my_function():
print("Hello from a function")
my_function()
Try it Yourself »
Arguments
Information can be passed into functions as arguments.
Arguments are specified after the function name, inside the parentheses. You can add as
many arguments as you want, just separate them with a comma.
The following example has a function with one argument (fname). When the function is
called, we pass along a first name, which is used inside the function to print the full
name:
Example
def my_function(fname):
print(fname + " Refsnes")
my_function("Emil")
my_function("Tobias")
my_function("Linus")
Try it Yourself »
Parameters or Arguments?
The terms parameter and argument can be used for the same thing: information that are
passed into a function.
A parameter is the variable listed inside the parentheses in the function definition.
Example
This function expects 2 arguments, and gets 2 arguments:
my_function("Emil", "Refsnes")
Try it Yourself »
If you try to call the function with 1 or 3 arguments, you will get an error:
Example
This function expects 2 arguments, but gets only 1:
my_function("Emil")
Try it Yourself »
This way the function will receive a tuple of arguments, and can access the items
accordingly:
Example
If the number of arguments is unknown, add a * before the parameter name:
def my_function(*kids):
print("The youngest child is " + kids[2])
Example
def my_function(child3, child2, child1):
print("The youngest child is " + child3)
The phrase Keyword Arguments are often shortened to kwargs in Python documentations.
This way the function will receive a dictionary of arguments, and can access the items
accordingly:
Example
If the number of keyword arguments is unknown, add a double ** before the parameter
name:
def my_function(**kid):
print("His last name is " + kid["lname"])
my_function("Sweden")
my_function("India")
my_function()
my_function("Brazil")
Try it Yourself »
E.g. if you send a List as an argument, it will still be a List when it reaches the function:
Example
def my_function(food):
for x in food:
print(x)
my_function(fruits)
Try it Yourself »
Return Values
To let a function return a value, use the return statement:
Example
def my_function(x):
return 5 * x
print(my_function(3))
print(my_function(5))
print(my_function(9))
Try it Yourself »
The pass Statement
function definitions cannot be empty, but if you for some reason have a function definition
with no content, put in the pass statement to avoid getting an error.
Example
def myfunction():
pass
Try it Yourself »
Recursion
Python also accepts function recursion, which means a defined function can call itself.
The developer should be very careful with recursion as it can be quite easy to slip into
writing a function which never terminates, or one that uses excess amounts of memory
or processor power. However, when written correctly recursion can be a very efficient and
mathematically-elegant approach to programming.
To a new developer it can take some time to work out how exactly this works, best way
to find out is by testing and modifying it.
Example
Recursion Example
def tri_recursion(k):
if(k > 0):
result = k + tri_recursion(k - 1)
print(result)
else:
result = 0
return result
Try it Yourself »
Test Yourself With Exercises
Exercise:
Create a function named my_function.
:
print("Hello from a function")
Submit Answer »
Python Lambda
❮ PreviousNext ❯
A lambda function can take any number of arguments, but can only have one
expression.
Syntax
lambda arguments : expression
Example
Add 10 to argument a, and return the result:
x = lambda a : a + 10
print(x(5))
Try it Yourself »
Example
Multiply argument a with argument b and return the result:
x = lambda a, b : a * b
print(x(5, 6))
Try it Yourself »
Example
Summarize argument a, b, and c and return the result:
x = lambda a, b, c : a + b + c
print(x(5, 6, 2))
Try it Yourself »
Say you have a function definition that takes one argument, and that argument will be
multiplied with an unknown number:
def myfunc(n):
return lambda a : a * n
Use that function definition to make a function that always doubles the number you send
in:
Example
def myfunc(n):
return lambda a : a * n
mydoubler = myfunc(2)
print(mydoubler(11))
Try it Yourself »
Or, use the same function definition to make a function that always triples the number
you send in:
Example
def myfunc(n):
return lambda a : a * n
mytripler = myfunc(3)
print(mytripler(11))
Try it Yourself »
Or, use the same function definition to make both functions, in the same program:
Example
def myfunc(n):
return lambda a : a * n
mydoubler = myfunc(2)
mytripler = myfunc(3)
print(mydoubler(11))
print(mytripler(11))
Try it Yourself »
Use lambda functions when an anonymous function is required for a short period of time.
x =
Submit Answer »
❮ PreviousNext ❯
❮ PreviousNext ❯
Python Arrays
❮ PreviousNext ❯
Note: Python does not have built-in support for Arrays, but Python Lists can be used
instead.
Arrays
Note: This page shows you how to use LISTS as ARRAYS, however, to work with arrays
in Python you will have to import a library, like the NumPy library.
Example
Create an array containing car names:
What is an Array?
An array is a special variable, which can hold more than one value at a time.
If you have a list of items (a list of car names, for example), storing the cars in single
variables could look like this:
car1 = "Ford"
car2 = "Volvo"
car3 = "BMW"
However, what if you want to loop through the cars and find a specific one? And what if
you had not 3 cars, but 300?
An array can hold many values under a single name, and you can access the values by
referring to an index number.
Access the Elements of an Array
You refer to an array element by referring to the index number.
Example
Get the value of the first array item:
x = cars[0]
Try it Yourself »
Example
Modify the value of the first array item:
cars[0] = "Toyota"
Try it Yourself »
Example
Return the number of elements in the cars array:
x = len(cars)
Try it Yourself »
Note: The length of an array is always one more than the highest array index.
Example
Print each item in the cars array:
for x in cars:
print(x)
Try it Yourself »
Example
Add one more element to the cars array:
cars.append("Honda")
Try it Yourself »
Example
Delete the second element of the cars array:
cars.pop(1)
Try it Yourself »
You can also use the remove() method to remove an element from the array.
Example
Delete the element that has the value "Volvo":
cars.remove("Volvo")
Try it Yourself »
Note: The list's remove() method only removes the first occurrence of the specified value.
Array Methods
Python has a set of built-in methods that you can use on lists/arrays.
Method Description
extend() Add the elements of a list (or any iterable), to the end of the current list
index() Returns the index of the first element with the specified value
Note: Python does not have built-in support for Arrays, but Python Lists can be used
instead.
❮ PreviousNext ❯
Python Classes/Objects
Python is an object oriented programming language.
Create a Class
To create a class, use the keyword class:
Example
Create a class named MyClass, with a property named x:
class MyClass:
x = 5
Try it Yourself »
Create Object
Now we can use the class named MyClass to create objects:
Example
Create an object named p1, and print the value of x:
p1 = MyClass()
print(p1.x)
Try it Yourself »
The __init__() Function
The examples above are classes and objects in their simplest form, and are not really
useful in real life applications.
All classes have a function called __init__(), which is always executed when the class is
being initiated.
Use the __init__() function to assign values to object properties, or other operations that
are necessary to do when the object is being created:
Example
Create a class named Person, use the __init__() function to assign values for name and
age:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
p1 = Person("John", 36)
print(p1.name)
print(p1.age)
Try it Yourself »
Note: The __init__() function is called automatically every time the class is being used to
create a new object.
Object Methods
Objects can also contain methods. Methods in objects are functions that belong to the
object.
Example
Insert a function that prints a greeting, and execute it on the p1 object:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def myfunc(self):
print("Hello my name is " + self.name)
p1 = Person("John", 36)
p1.myfunc()
Try it Yourself »
Note: The self parameter is a reference to the current instance of the class, and is used
to access variables that belong to the class.
It does not have to be named self , you can call it whatever you like, but it has to be the
first parameter of any function in the class:
Example
Use the words mysillyobject and abc instead of self:
class Person:
def __init__(mysillyobject, name, age):
mysillyobject.name = name
mysillyobject.age = age
def myfunc(abc):
print("Hello my name is " + abc.name)
p1 = Person("John", 36)
p1.myfunc()
Try it Yourself »
Example
Set the age of p1 to 40:
p1.age = 40
Try it Yourself »
Example
Delete the age property from the p1 object:
del p1.age
Try it Yourself »
Delete Objects
You can delete objects by using the del keyword:
Example
Delete the p1 object:
del p1
Try it Yourself »
Example
class Person:
pass
Try it Yourself »
MyClass:
x = 5
Submit Answer »
❮ PreviousNext ❯
Python Inheritance
❮ PreviousNext ❯
Python Inheritance
Inheritance allows us to define a class that inherits all the methods and properties from
another class.
Parent class is the class being inherited from, also called base class.
Child class is the class that inherits from another class, also called derived class.
Example
Create a class named Person, with firstname and lastname properties, and
a printname method:
class Person:
def __init__(self, fname, lname):
self.firstname = fname
self.lastname = lname
def printname(self):
print(self.firstname, self.lastname)
#Use the Person class to create an object, and then execute the printname method:
x = Person("John", "Doe")
x.printname()
Try it Yourself »
Example
Create a class named Student, which will inherit the properties and methods from
the Person class:
class Student(Person):
pass
Note: Use the pass keyword when you do not want to add any other properties or
methods to the class.
Now the Student class has the same properties and methods as the Person class.
Example
Use the Student class to create an object, and then execute the printname method:
x = Student("Mike", "Olsen")
x.printname()
Try it Yourself »
We want to add the __init__() function to the child class (instead of the pass keyword).
Note: The __init__() function is called automatically every time the class is being used to
create a new object.
Example
Add the __init__() function to the Student class:
class Student(Person):
def __init__(self, fname, lname):
#add properties etc.
When you add the __init__() function, the child class will no longer inherit the
parent's __init__() function.
To keep the inheritance of the parent's __init__() function, add a call to the
parent's __init__() function:
Example
class Student(Person):
def __init__(self, fname, lname):
Person.__init__(self, fname, lname)
Try it Yourself »
Now we have successfully added the __init__() function, and kept the inheritance of the
parent class, and we are ready to add functionality in the __init__() function.
Example
class Student(Person):
def __init__(self, fname, lname):
super().__init__(fname, lname)
Try it Yourself »
By using the super() function, you do not have to use the name of the parent element, it
will automatically inherit the methods and properties from its parent.
Add Properties
Example
Add a property called graduationyear to the Student class:
class Student(Person):
def __init__(self, fname, lname):
super().__init__(fname, lname)
self.graduationyear = 2019
Try it Yourself »
In the example below, the year 2019 should be a variable, and passed into
the Student class when creating student objects. To do so, add another parameter in the
__init__() function:
Example
Add a year parameter, and pass the correct year when creating objects:
class Student(Person):
def __init__(self, fname, lname, year):
super().__init__(fname, lname)
self.graduationyear = year
Add Methods
Example
Add a method called welcome to the Student class:
class Student(Person):
def __init__(self, fname, lname, year):
super().__init__(fname, lname)
self.graduationyear = year
def welcome(self):
print("Welcome", self.firstname, self.lastname, "to the class of",
self.graduationyear)
Try it Yourself »
If you add a method in the child class with the same name as a function in the parent
class, the inheritance of the parent method will be overridden.
class :
Submit Answer »
❮ PreviousNext ❯
Python Iterators
❮ PreviousNext ❯
Python Iterators
An iterator is an object that contains a countable number of values.
An iterator is an object that can be iterated upon, meaning that you can traverse through
all the values.
Iterator vs Iterable
Lists, tuples, dictionaries, and sets are all iterable objects. They are
iterable containers which you can get an iterator from.
All these objects have a iter() method which is used to get an iterator:
Example
Return an iterator from a tuple, and print each value:
print(next(myit))
print(next(myit))
print(next(myit))
Try it Yourself »
Example
Strings are also iterable objects, containing a sequence of characters:
mystr = "banana"
myit = iter(mystr)
print(next(myit))
print(next(myit))
print(next(myit))
print(next(myit))
print(next(myit))
print(next(myit))
Try it Yourself »
Example
Iterate the values of a tuple:
for x in mytuple:
print(x)
Try it Yourself »
Example
Iterate the characters of a string:
mystr = "banana"
for x in mystr:
print(x)
Try it Yourself »
The for loop actually creates an iterator object and executes the next() method for each
loop.
Create an Iterator
To create an object/class as an iterator you have to implement the
methods __iter__() and __next__() to your object.
As you have learned in the Python Classes/Objects chapter, all classes have a function
called __init__(), which allows you to do some initializing when the object is being
created.
The __iter__() method acts similar, you can do operations (initializing etc.), but must
always return the iterator object itself.
The __next__() method also allows you to do operations, and must return the next item in
the sequence.
Example
Create an iterator that returns numbers, starting with 1, and each sequence will increase
by one (returning 1,2,3,4,5 etc.):
class MyNumbers:
def __iter__(self):
self.a = 1
return self
def __next__(self):
x = self.a
self.a += 1
return x
myclass = MyNumbers()
myiter = iter(myclass)
print(next(myiter))
print(next(myiter))
print(next(myiter))
print(next(myiter))
print(next(myiter))
Try it Yourself »
StopIteration
The example above would continue forever if you had enough next() statements, or if it
was used in a for loop.
To prevent the iteration to go on forever, we can use the StopIteration statement.
In the __next__() method, we can add a terminating condition to raise an error if the
iteration is done a specified number of times:
Example
Stop after 20 iterations:
class MyNumbers:
def __iter__(self):
self.a = 1
return self
def __next__(self):
if self.a <= 20:
x = self.a
self.a += 1
return x
else:
raise StopIteration
myclass = MyNumbers()
myiter = iter(myclass)
for x in myiter:
print(x)
Try it Yourself »
❮ PreviousNext ❯
Python Scope
❮ PreviousNext ❯
A variable is only available from inside the region it is created. This is called scope.
Local Scope
A variable created inside a function belongs to the local scope of that function, and can
only be used inside that function.
Example
A variable created inside a function is available inside that function:
def myfunc():
x = 300
print(x)
myfunc()
Try it Yourself »
Example
The local variable can be accessed from a function within the function:
def myfunc():
x = 300
def myinnerfunc():
print(x)
myinnerfunc()
myfunc()
Try it Yourself »
Global Scope
A variable created in the main body of the Python code is a global variable and belongs to
the global scope.
Global variables are available from within any scope, global and local.
Example
A variable created outside of a function is global and can be used by anyone:
x = 300
def myfunc():
print(x)
myfunc()
print(x)
Try it Yourself »
Naming Variables
If you operate with the same variable name inside and outside of a function, Python will
treat them as two separate variables, one available in the global scope (outside the
function) and one available in the local scope (inside the function):
Example
The function will print the local x, and then the code will print the global x:
x = 300
def myfunc():
x = 200
print(x)
myfunc()
print(x)
Try it Yourself »
Global Keyword
If you need to create a global variable, but are stuck in the local scope, you can use
the global keyword.
Example
If you use the global keyword, the variable belongs to the global scope:
def myfunc():
global x
x = 300
myfunc()
print(x)
Try it Yourself »
Also, use the global keyword if you want to make a change to a global variable inside a
function.
Example
To change the value of a global variable inside a function, refer to the variable by using
the global keyword:
x = 300
def myfunc():
global x
x = 200
myfunc()
print(x)
Try it Yourself »
❮ PreviousNext ❯
Python Modules
❮ PreviousNext ❯
What is a Module?
Consider a module to be the same as a code library.
Create a Module
To create a module just save the code you want in a file with the file extension .py:
Example
Save this code in a file named mymodule.py
def greeting(name):
print("Hello, " + name)
Use a Module
Now we can use the module we just created, by using the import statement:
Example
Import the module named mymodule, and call the greeting function:
import mymodule
mymodule.greeting("Jonathan")
Run Example »
Variables in Module
The module can contain functions, as already described, but also variables of all types
(arrays, dictionaries, objects etc):
Example
Save this code in the file mymodule.py
person1 = {
"name": "John",
"age": 36,
"country": "Norway"
}
Example
Import the module named mymodule, and access the person1 dictionary:
import mymodule
a = mymodule.person1["age"]
print(a)
Run Example »
Naming a Module
You can name the module file whatever you like, but it must have the file extension .py
Re-naming a Module
You can create an alias when you import a module, by using the as keyword:
Example
Create an alias for mymodule called mx:
import mymodule as mx
a = mx.person1["age"]
print(a)
Run Example »
Built-in Modules
There are several built-in modules in Python, which you can import whenever you like.
Example
Import and use the platform module:
import platform
x = platform.system()
print(x)
Try it Yourself »
Example
List all the defined names belonging to the platform module:
import platform
x = dir(platform)
print(x)
Try it Yourself »
Note: The dir() function can be used on all modules, also the ones you create yourself.
Import From Module
You can choose to import only parts from a module, by using the from keyword.
Example
The module named mymodule has one function and one dictionary:
def greeting(name):
print("Hello, " + name)
person1 = {
"name": "John",
"age": 36,
"country": "Norway"
}
Example
Import only the person1 dictionary from the module:
print (person1["age"])
Run Example »
Note: When importing using the from keyword, do not use the module name when
referring to elements in the module. Example: person1["age"], not mymodule.person1["age"]
mymodule
Submit Answer »
Python Datetime
❮ PreviousNext ❯
Python Dates
A date in Python is not a data type of its own, but we can import a module
named datetime to work with dates as date objects.
Example
Import the datetime module and display the current date:
import datetime
x = datetime.datetime.now()
print(x)
Try it Yourself »
Date Output
When we execute the code from the example above the result will be:
2022-08-15 18:16:08.789842
The date contains year, month, day, hour, minute, second, and microsecond.
The datetime module has many methods to return information about the date object.
Here are a few examples, you will learn more about them later in this chapter:
Example
Return the year and name of weekday:
import datetime
x = datetime.datetime.now()
print(x.year)
print(x.strftime("%A"))
Try it Yourself »
The datetime() class requires three parameters to create a date: year, month, day.
Example
Create a date object:
import datetime
x = datetime.datetime(2020, 5, 17)
print(x)
Try it Yourself »
The datetime() class also takes parameters for time and timezone (hour, minute, second,
microsecond, tzone), but they are optional, and has a default value of 0, (None for
timezone).
The method is called strftime(), and takes one parameter, format, to specify the format of
the returned string:
Example
Display the name of the month:
import datetime
x = datetime.datetime(2018, 6, 1)
print(x.strftime("%B"))
Try it Yourself »
%p AM/PM PM Try it
»
%Z Timezone CST
%C Century 20 Try it
»
%% A % character % Try it
»
❮ PreviousNext ❯
Python Math
❮ PreviousNext ❯
Python has a set of built-in math functions, including an extensive math module, that
allows you to perform mathematical tasks on numbers.
Example
x = min(5, 10, 25)
y = max(5, 10, 25)
print(x)
print(y)
Try it Yourself »
The abs() function returns the absolute (positive) value of the specified number:
Example
x = abs(-7.25)
print(x)
Try it Yourself »
Example
Return the value of 4 to the power of 3 (same as 4 * 4 * 4):
x = pow(4, 3)
print(x)
Try it Yourself »
The Math Module
Python has also a built-in module called math, which extends the list of mathematical
functions.
import math
When you have imported the math module, you can start using methods and constants of
the module.
The math.sqrt() method for example, returns the square root of a number:
Example
import math
x = math.sqrt(64)
print(x)
Try it Yourself »
The math.ceil() method rounds a number upwards to its nearest integer, and
the math.floor() method rounds a number downwards to its nearest integer, and returns
the result:
Example
import math
x = math.ceil(1.4)
y = math.floor(1.4)
print(x) # returns 2
print(y) # returns 1
Try it Yourself »
Example
import math
x = math.pi
print(x)
Try it Yourself »
Complete Math Module Reference
In our Math Module Reference you will find a complete reference of all methods and
constants that belongs to the Math module.
❮ PreviousNext ❯
Python JSON
❮ PreviousNext ❯
JSON in Python
Python has a built-in package called json, which can be used to work with JSON data.
Example
Import the json module:
import json
Example
Convert from JSON to Python:
import json
# some JSON:
x = '{ "name":"John", "age":30, "city":"New York"}'
# parse x:
y = json.loads(x)
Example
Convert from Python to JSON:
import json
You can convert Python objects of the following types, into JSON strings:
• dict
• list
• tuple
• string
• int
• float
• True
• False
• None
Example
Convert Python objects into JSON strings, and print the values:
import json
When you convert from Python to JSON, Python objects are converted into the JSON
(JavaScript) equivalent:
Python JSON
dict Object
list Array
tuple Array
str String
int Number
float Number
True true
False false
None null
Example
Convert a Python object containing all the legal data types:
import json
x = {
"name": "John",
"age": 30,
"married": True,
"divorced": False,
"children": ("Ann","Billy"),
"pets": None,
"cars": [
{"model": "BMW 230", "mpg": 27.5},
{"model": "Ford Edge", "mpg": 24.1}
]
}
print(json.dumps(x))
Try it Yourself »
The json.dumps() method has parameters to make it easier to read the result:
Example
Use the indent parameter to define the numbers of indents:
json.dumps(x, indent=4)
Try it Yourself »
You can also define the separators, default value is (", ", ": "), which means using a
comma and a space to separate each object, and a colon and a space to separate keys
from values:
Example
Use the separators parameter to change the default separator:
Try it Yourself »
Example
Use the sort_keys parameter to specify if the result should be sorted or not:
Try it Yourself »
❮ PreviousNext ❯
Python RegEx
❮ PreviousNext ❯
RegEx can be used to check if a string contains the specified search pattern.
RegEx Module
Python has a built-in package called re, which can be used to work with Regular Expressions.
import re
RegEx in Python
When you have imported the re module, you can start using regular expressions:
Example
Search the string to see if it starts with "The" and ends with "Spain":
import re
RegEx Functions
The re module offers a set of functions that allows us to search a string for a match:
Function Description
split Returns a list where the string has been split at each match
Special Sequences
A special sequence is a \ followed by one of the characters in the list below, and has a special meaning:
Sets
A set is a set of characters inside a pair of square brackets [] with a special meaning:
Example
Print a list of all matches:
import re
The list contains the matches in the order they are found.
Example
Return an empty list if no match was found:
import re
If there is more than one match, only the first occurrence of the match will be returned:
Example
Search for the first white-space character in the string:
import re
Example
Make a search that returns no match:
import re
txt = "The rain in Spain"
x = re.search("Portugal", txt)
print(x)
Try it Yourself »
Example
Split at each white-space character:
import re
You can control the number of occurrences by specifying the maxsplit parameter:
Example
Split the string only at the first occurrence:
import re
Example
Replace every white-space character with the number 9:
import re
txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x)
Try it Yourself »
You can control the number of replacements by specifying the count parameter:
Example
Replace the first 2 occurrences:
import re
Match Object
A Match Object is an object containing information about the search and the result.
Note: If there is no match, the value None will be returned, instead of the Match Object.
Example
Do a search that will return a Match Object:
import re
The Match object has properties and methods used to retrieve information about the search, and the result:
.span() returns a tuple containing the start-, and end positions of the match.
.string returns the string passed into the function
.group() returns the part of the string where there was a match
Example
Print the position (start- and end-position) of the first match occurrence.
The regular expression looks for any words that starts with an upper case "S":
import re
txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.span())
Try it Yourself »
Example
Print the string passed into the function:
import re
Example
Print the part of the string where there was a match.
The regular expression looks for any words that starts with an upper case "S":
import re
Note: If there is no match, the value None will be returned, instead of the Match Object.
❮ PreviousNext ❯
Python PIP
❮ PreviousNext ❯
What is PIP?
PIP is a package manager for Python packages, or modules if you like.
Note: If you have Python version 3.4 or later, PIP is included by default.
What is a Package?
A package contains all the files you need for a module.
Modules are Python code libraries you can include in your project.
Example
Check PIP version:
Install PIP
If you do not have PIP installed, you can download and install it from this
page: https://pypi.org/project/pip/
Download a Package
Downloading a package is very easy.
Open the command line interface and tell PIP to download the package you want.
Navigate your command line to the location of Python's script directory, and type the
following:
Example
Download a package named "camelcase":
Example
Import and use "camelcase":
import camelcase
c = camelcase.CamelCase()
print(c.hump(txt))
Run Example »
Find Packages
Find more packages at https://pypi.org/.
Remove a Package
Use the uninstall command to remove a package:
Example
Uninstall the package named "camelcase":
The PIP Package Manager will ask you to confirm that you want to remove the camelcase
package:
Uninstalling camelcase-02.1:
Would remove:
c:\users\Your Name\appdata\local\programs\python\python36-32\lib\site-
packages\camecase-0.2-py3.6.egg-info
c:\users\Your Name\appdata\local\programs\python\python36-32\lib\site-
packages\camecase\*
Proceed (y/n)?
List Packages
Use the list command to list all the packages installed on your system:
Example
List installed packages:
Result:
Package Version
-----------------------
camelcase 0.2
mysql-connector 2.1.6
pip 18.1
pymongo 3.6.1
setuptools 39.0.1
❮ PreviousNext ❯
The try block lets you test a block of code for errors.
The else block lets you execute code when there is no error.
The finally block lets you execute code, regardless of the result of the try- and
except blocks.
Exception Handling
When an error occurs, or exception as we call it, Python will normally stop and generate
an error message.
Example
The try block will generate an exception, because x is not defined:
try:
print(x)
except:
print("An exception occurred")
Try it Yourself »
Since the try block raises an error, the except block will be executed.
Without the try block, the program will crash and raise an error:
Example
This statement will raise an error, because x is not defined:
print(x)
Try it Yourself »
Many Exceptions
You can define as many exception blocks as you want, e.g. if you want to execute a
special block of code for a special kind of error:
Example
Print one message if the try block raises a NameError and another for other errors:
try:
print(x)
except NameError:
print("Variable x is not defined")
except:
print("Something else went wrong")
Try it Yourself »
Else
You can use the else keyword to define a block of code to be executed if no errors were
raised:
Example
In this example, the try block does not generate any error:
try:
print("Hello")
except:
print("Something went wrong")
else:
print("Nothing went wrong")
Try it Yourself »
Finally
The finally block, if specified, will be executed regardless if the try block raises an error
or not.
Example
try:
print(x)
except:
print("Something went wrong")
finally:
print("The 'try except' is finished")
Try it Yourself »
Example
Try to open and write to a file that is not writable:
try:
f = open("demofile.txt")
try:
f.write("Lorum Ipsum")
except:
print("Something went wrong when writing to the file")
finally:
f.close()
except:
print("Something went wrong when opening the file")
Try it Yourself »
The program can continue, without leaving the file object open.
Raise an exception
As a Python developer you can choose to throw an exception if a condition occurs.
Example
Raise an error and stop the program if x is lower than 0:
x = -1
if x < 0:
raise Exception("Sorry, no numbers below zero")
Try it Yourself »
You can define what kind of error to raise, and the text to print to the user.
Example
Raise a TypeError if x is not an integer:
x = "hello"
❮ PreviousNext ❯
Python User Input
❮ PreviousNext ❯
User Input
Python allows for user input.
The following example asks for the username, and when you entered the username, it
gets printed on the screen:
Python 3.6
username = input("Enter username:")
print("Username is: " + username)
Run Example »
Python 2.7
username = raw_input("Enter username:")
print("Username is: " + username)
Run Example »
Python stops executing when it comes to the input() function, and continues when the
user has given some input.
❮ PreviousNext ❯
Python String Formatting
❮ PreviousNext ❯
To make sure a string will display as expected, we can format the result with
the format() method.
String format()
The format() method allows you to format selected parts of a string.
Sometimes there are parts of a text that you do not control, maybe they come from a
database, or user input?
To control such values, add placeholders (curly brackets {}) in the text, and run the
values through the format() method:
Example
Add a placeholder where you want to display the price:
price = 49
txt = "The price is {} dollars"
print(txt.format(price))
Try it Yourself »
You can add parameters inside the curly brackets to specify how to convert the value:
Example
Format the price to be displayed as a number with two decimals:
Try it Yourself »
Multiple Values
If you want to use more values, just add more values to the format() method:
print(txt.format(price, itemno, count))
Example
quantity = 3
itemno = 567
price = 49
myorder = "I want {} pieces of item number {} for {:.2f} dollars."
print(myorder.format(quantity, itemno, price))
Try it Yourself »
Index Numbers
You can use index numbers (a number inside the curly brackets {0}) to be sure the values
are placed in the correct placeholders:
Example
quantity = 3
itemno = 567
price = 49
myorder = "I want {0} pieces of item number {1} for {2:.2f} dollars."
print(myorder.format(quantity, itemno, price))
Try it Yourself »
Also, if you want to refer to the same value more than once, use the index number:
Example
age = 36
name = "John"
txt = "His name is {1}. {1} is {0} years old."
print(txt.format(age, name))
Try it Yourself »
Named Indexes
You can also use named indexes by entering a name inside the curly brackets {carname},
but then you must use names when you pass the parameter values txt.format(carname =
"Ford"):
Example
myorder = "I have a {carname}, it is a {model}."
print(myorder.format(carname = "Ford", model = "Mustang"))
Try it Yourself »
❮ PreviousNext ❯
Python File Open
❮ PreviousNext ❯
Python has several functions for creating, reading, updating, and deleting files.
File Handling
The key function for working with files in Python is the open() function.
"r" - Read - Default value. Opens a file for reading, error if the file does not exist
"a" - Append - Opens a file for appending, creates the file if it does not exist
"w" - Write - Opens a file for writing, creates the file if it does not exist
"x" - Create - Creates the specified file, returns an error if the file exists
In addition you can specify if the file should be handled as binary or text mode
Syntax
To open a file for reading it is enough to specify the name of the file:
f = open("demofile.txt")
f = open("demofile.txt", "rt")
Because "r" for read, and "t" for text are the default values, you do not need to specify
them.
Note: Make sure the file exists, or else you will get an error.
❮ PreviousNext ❯
demofile.txt
The open() function returns a file object, which has a read() method for reading the
content of the file:
Example
f = open("demofile.txt", "r")
print(f.read())
Run Example »
If the file is located in a different location, you will have to specify the file path, like this:
Example
Open a file on a different location:
f = open("D:\\myfiles\welcome.txt", "r")
print(f.read())
Run Example »
Example
Return the 5 first characters of the file:
f = open("demofile.txt", "r")
print(f.read(5))
Run Example »
Read Lines
You can return one line by using the readline() method:
Example
Read one line of the file:
f = open("demofile.txt", "r")
print(f.readline())
Run Example »
By calling readline() two times, you can read the two first lines:
Example
Read two lines of the file:
f = open("demofile.txt", "r")
print(f.readline())
print(f.readline())
Run Example »
By looping through the lines of the file, you can read the whole file, line by line:
Example
Loop through the file line by line:
f = open("demofile.txt", "r")
for x in f:
print(x)
Run Example »
Close Files
It is a good practice to always close the file when you are done with it.
Example
Close the file when you are finish with it:
f = open("demofile.txt", "r")
print(f.readline())
f.close()
Run Example »
Note: You should always close your files, in some cases, due to buffering, changes made
to a file may not show until you close the file.
❮ PreviousNext ❯
Example
Open the file "demofile2.txt" and append content to the file:
f = open("demofile2.txt", "a")
f.write("Now the file has more content!")
f.close()
Example
Open the file "demofile3.txt" and overwrite the content:
f = open("demofile3.txt", "w")
f.write("Woops! I have deleted the content!")
f.close()
"x" - Create - will create a file, returns an error if the file exist
"a" - Append - will create a file if the specified file does not exist
"w" - Write - will create a file if the specified file does not exist
Example
Create a file called "myfile.txt":
f = open("myfile.txt", "x")
Example
Create a new file if it does not exist:
f = open("myfile.txt", "w")
❮ PreviousNext ❯
Python Delete File
❮ PreviousNext ❯
Delete a File
To delete a file, you must import the OS module, and run its os.remove() function:
Example
Remove the file "demofile.txt":
import os
os.remove("demofile.txt")
Example
Check if file exists, then delete it:
import os
if os.path.exists("demofile.txt"):
os.remove("demofile.txt")
else:
print("The file does not exist")
Delete Folder
To delete an entire folder, use the os.rmdir() method:
Example
Remove the folder "myfolder":
import os
os.rmdir("myfolder")
Note: You can only remove empty folders.
❮ PreviousNext ❯
Matplotlib Tutorial
❮ PreviousNext ❯
What is Matplotlib?
Matplotlib is a low level graph plotting library in python that serves as a visualization
utility.
Matplotlib is mostly written in python, a few segments are written in C, Objective-C and
Javascript for Platform compatibility.
Where is the Matplotlib Codebase?
The source code for Matplotlib is located at this github
repository https://github.com/matplotlib/matplotlib
❮ PreviousNext ❯
Installation of Matplotlib
If you have Python and PIP already installed on a system, then installation of Matplotlib is
very easy.
If this command fails, then use a python distribution that already has Matplotlib
installed, like Anaconda, Spyder etc.
Import Matplotlib
Once Matplotlib is installed, import it in your applications by adding
the import module statement:
import matplotlib
print(matplotlib.__version__)
Try it Yourself »
❮ PreviousNext ❯
Matplotlib Pyplot
❮ PreviousNext ❯
Pyplot
Most of the Matplotlib utilities lies under the pyplot submodule, and are usually imported
under the plt alias:
Example
Draw a line in a diagram from position (0,0) to position (6,250):
plt.plot(xpoints, ypoints)
plt.show()
Result:
Try it Yourself »
You will learn more about drawing (plotting) in the next chapters.
❮ PreviousNext ❯
Matplotlib Plotting
❮ PreviousNext ❯
If we need to plot a line from (1, 3) to (8, 10), we have to pass two arrays [1, 8] and [3,
10] to the plot function.
Example
Draw a line in a diagram from position (1, 3) to position (8, 10):
plt.plot(xpoints, ypoints)
plt.show()
Result:
Try it Yourself »
Example
Draw two points in the diagram, one at position (1, 3) and one in position (8, 10):
Result:
Try it Yourself »
Multiple Points
You can plot as many points as you like, just make sure you have the same number of
points in both axis.
Example
Draw a line in a diagram from position (1, 3) to (2, 8) then to (6, 1) and finally to
position (8, 10):
plt.plot(xpoints, ypoints)
plt.show()
Result:
Try it Yourself »
Default X-Points
If we do not specify the points in the x-axis, they will get the default values 0, 1, 2, 3,
(etc. depending on the length of the y-points.
So, if we take the same example as above, and leave out the x-points, the diagram will
look like this:
Example
Plotting without x-points:
plt.plot(ypoints)
plt.show()
Result:
Try it Yourself »
❮ PreviousNext ❯
Matplotlib Markers
❮ PreviousNext ❯
Markers
You can use the keyword argument marker to emphasize each point with a specified
marker:
Example
Mark each point with a circle:
Result:
Try it Yourself »
Example
Mark each point with a star:
...
plt.plot(ypoints, marker = '*')
...
Result:
Try it Yourself »
Marker Reference
You can choose any of these markers:
Marker Description
'x' X Try it »
This parameter is also called fmt, and is written with this syntax:
marker|line|color
Example
Mark each point with a circle:
import matplotlib.pyplot as plt
import numpy as np
plt.plot(ypoints, 'o:r')
plt.show()
Result:
Try it Yourself »
The marker value can be anything from the Marker Reference above.
Line Reference
Line Syntax Description
Note: If you leave out the line value in the fmt parameter, no line will be plotted.
Color Reference
Color Syntax Description
Marker Size
You can use the keyword argument markersize or the shorter version, ms to set the size of
the markers:
Example
Set the size of the markers to 20:
Result:
Try it Yourself »
Marker Color
You can use the keyword argument markeredgecolor or the shorter mec to set the color of
the edge of the markers:
Example
Set the EDGE color to red:
Result:
Try it Yourself »
You can use the keyword argument markerfacecolor or the shorter mfc to set the color
inside the edge of the markers:
Example
Set the FACE color to red:
Result:
Try it Yourself »
Use both the mec and mfc arguments to color of the entire marker:
Example
Set the color of both the edge and the face to red:
Result:
Try it Yourself »
Example
Mark each point with a beautiful green color:
...
plt.plot(ypoints, marker = 'o', ms = 20, mec = '#4CAF50', mfc = '#4CAF50')
...
Result:
Try it Yourself »
Example
Mark each point with the color named "hotpink":
...
plt.plot(ypoints, marker = 'o', ms = 20, mec = 'hotpink', mfc = 'hotpink')
...
Result:
Try it Yourself »
Matplotlib Line
❮ PreviousNext ❯
Linestyle
You can use the keyword argument linestyle, or shorter ls, to change the style of the
plotted line:
Example
Use a dotted line:
Result:
Try it Yourself »
Example
Use a dashed line:
Result:
Try it Yourself »
Shorter Syntax
The line style can be written in a shorter syntax:
Example
Shorter syntax:
plt.plot(ypoints, ls = ':')
Result:
Try it Yourself »
Line Styles
You can choose any of these styles:
Style Or
Line Color
You can use the keyword argument color or the shorter c to set the color of the line:
Example
Set the line color to red:
Result:
Try it Yourself »
Example
Plot with a beautiful green line:
...
plt.plot(ypoints, c = '#4CAF50')
...
Result:
Try it Yourself »
Example
Plot with the color named "hotpink":
...
plt.plot(ypoints, c = 'hotpink')
...
Result:
Try it Yourself »
Line Width
You can use the keyword argument linewidth or the shorter lw to change the width of the
line.
Example
Plot with a 20.5pt wide line:
Result:
Try it Yourself »
Multiple Lines
You can plot as many lines as you like by simply adding more plt.plot() functions:
Example
Draw two lines by specifying a plt.plot() function for each line:
y1 = np.array([3, 8, 1, 10])
y2 = np.array([6, 2, 7, 11])
plt.plot(y1)
plt.plot(y2)
plt.show()
Result:
Try it Yourself »
You can also plot many lines by adding the points for the x- and y-axis for each line in
the same plt.plot() function.
(In the examples above we only specified the points on the y-axis, meaning that the
points on the x-axis got the the default values (0, 1, 2, 3).)
Example
Draw two lines by specifiyng the x- and y-point values for both lines:
x1 = np.array([0, 1, 2, 3])
y1 = np.array([3, 8, 1, 10])
x2 = np.array([0, 1, 2, 3])
y2 = np.array([6, 2, 7, 11])
Result:
Try it Yourself »
❮ PreviousNext ❯
❮ PreviousNext ❯
Example
Add labels to the x- and y-axis:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")
plt.show()
Result:
Try it Yourself »
Example
Add a plot title and labels for the x- and y-axis:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.show()
Result:
Try it Yourself »
Example
Set font properties for the title and labels:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
font1 = {'family':'serif','color':'blue','size':20}
font2 = {'family':'serif','color':'darkred','size':15}
plt.plot(x, y)
plt.show()
Result:
Try it Yourself »
Position the Title
You can use the loc parameter in title() to position the title.
Legal values are: 'left', 'right', and 'center'. Default value is 'center'.
Example
Position the title to the left:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.show()
Result:
Try it Yourself »
Matplotlib Adding Grid Lines
❮ PreviousNext ❯
Example
Add grid lines to the plot:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.grid()
plt.show()
Result:
Try it Yourself »
Legal values are: 'x', 'y', and 'both'. Default value is 'both'.
Example
Display only grid lines for the x-axis:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.grid(axis = 'x')
plt.show()
Result:
Try it Yourself »
Example
Display only grid lines for the y-axis:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.grid(axis = 'y')
plt.show()
Result:
Try it Yourself »
Example
Set the line properties of the grid:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.title("Sports Watch Data")
plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")
plt.plot(x, y)
plt.show()
Result:
Try it Yourself »
❮ PreviousNext ❯
Matplotlib Subplot
❮ PreviousNext ❯
Example
Draw 2 plots:
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(1, 2, 1)
plt.plot(x,y)
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.show()
Result:
Try it Yourself »
The layout is organized in rows and columns, which are represented by the first and second argument.
plt.subplot(1, 2, 1)
#the figure has 1 row, 2 columns, and this plot is the first plot.
plt.subplot(1, 2, 2)
#the figure has 1 row, 2 columns, and this plot is the second plot.
So, if we want a figure with 2 rows an 1 column (meaning that the two plots will be displayed on top of each
other instead of side-by-side), we can write the syntax like this:
Example
Draw 2 plots on top of each other:
import matplotlib.pyplot as plt
import numpy as np
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(2, 1, 1)
plt.plot(x,y)
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(2, 1, 2)
plt.plot(x,y)
plt.show()
Result:
Try it Yourself »
You can draw as many plots you like on one figure, just descibe the number of rows, columns, and the index of
the plot.
Example
Draw 6 plots:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(2, 3, 1)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(2, 3, 2)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(2, 3, 3)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(2, 3, 4)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(2, 3, 5)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(2, 3, 6)
plt.plot(x,y)
plt.show()
Result:
Try it Yourself »
Title
You can add a title to each plot with the title() function:
Example
2 plots, with titles:
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(1, 2, 1)
plt.plot(x,y)
plt.title("SALES")
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.title("INCOME")
plt.show()
Result:
Try it Yourself »
Super Title
You can add a title to the entire figure with the suptitle() function:
Example
Add a title for the entire figure:
plt.subplot(1, 2, 1)
plt.plot(x,y)
plt.title("SALES")
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.title("INCOME")
plt.suptitle("MY SHOP")
plt.show()
Result:
Try it Yourself »
❮ PreviousNext ❯
Matplotlib Scatter
❮ PreviousNext ❯
The scatter() function plots one dot for each observation. It needs two arrays of the same
length, one for the values of the x-axis, and one for values on the y-axis:
Example
A simple scatter plot:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)
plt.show()
Result:
Try it Yourself »
The observation in the example above is the result of 13 cars passing by.
It seems that the newer the car, the faster it drives, but that could be a coincidence, after
all we only registered 13 cars.
Compare Plots
In the example above, there seems to be a relationship between speed and age, but what
if we plot the observations from another day as well? Will the scatter plot tell us
something else?
Example
Draw two plots on the same figure:
import matplotlib.pyplot as plt
import numpy as np
plt.show()
Result:
Try it Yourself »
Note: The two plots are plotted with two different colors, by default blue and orange, you
will learn how to change colors later in this chapter.
By comparing the two plots, I think it is safe to say that they both gives us the same
conclusion: the newer the car, the faster it drives.
Colors
You can set your own color for each scatter plot with the color or the c argument:
Example
Set your own color of the markers:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y, color = 'hotpink')
x = np.array([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12])
y = np.array([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85])
plt.scatter(x, y, color = '#88c999')
plt.show()
Result:
Try it Yourself »
Color Each Dot
You can even set a specific color for each dot by using an array of colors as value for
the c argument:
Note: You cannot use the color argument for this, only the c argument.
Example
Set your own color of the markers:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors =
np.array(["red","green","blue","yellow","pink","black","orange","purple","beige","
brown","gray","cyan","magenta"])
plt.scatter(x, y, c=colors)
plt.show()
Result:
Try it Yourself »
ColorMap
The Matplotlib module has a number of available colormaps.
A colormap is like a list of colors, where each color has a value that ranges from 0 to 100.
This colormap is called 'viridis' and as you can see it ranges from 0, which is a purple
color, and up to 100, which is a yellow color.
In addition you have to create an array with values (from 0 to 100), one value for each of
the point in the scatter plot:
Example
Create a color array, and specify a colormap in the scatter plot:
plt.show()
Result:
Try it Yourself »
You can include the colormap in the drawing by including the plt.colorbar() statement:
Example
Include the actual colormap:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors = np.array([0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90, 100])
plt.show()
Result:
Try it Yourself »
Available ColorMaps
You can choose any of the built-in colormaps:
Name Reverse
Size
You can change the size of the dots with the s argument.
Just like colors, make sure the array for sizes has the same length as the arrays for the
x- and y-axis:
Example
Set your own size for the markers:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
sizes = np.array([20,50,100,200,500,1000,60,90,10,300,600,800,75])
plt.scatter(x, y, s=sizes)
plt.show()
Result:
Try it Yourself »
Alpha
You can adjust the transparency of the dots with the alpha argument.
Just like colors, make sure the array for sizes has the same length as the arrays for the
x- and y-axis:
Example
Set your own size for the markers:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
sizes = np.array([20,50,100,200,500,1000,60,90,10,300,600,800,75])
plt.show()
Result:
Try it Yourself »
Example
Create random arrays with 100 values for x-points, y-points, colors and sizes:
x = np.random.randint(100, size=(100))
y = np.random.randint(100, size=(100))
colors = np.random.randint(100, size=(100))
sizes = 10 * np.random.randint(100, size=(100))
plt.colorbar()
plt.show()
Result:
Try it Yourself »
❮ PreviousNext ❯
Matplotlib Bars
❮ PreviousNext ❯
Creating Bars
With Pyplot, you can use the bar() function to draw bar graphs:
Example
Draw 4 bars:
plt.bar(x,y)
plt.show()
Result:
Try it Yourself »
The bar() function takes arguments that describes the layout of the bars.
The categories and their values represented by the first and second argument as arrays.
Example
x = ["APPLES", "BANANAS"]
y = [400, 350]
plt.bar(x, y)
Try it Yourself »
Horizontal Bars
If you want the bars to be displayed horizontally instead of vertically, use
the barh() function:
Example
Draw 4 horizontal bars:
plt.barh(x, y)
plt.show()
Result:
Try it Yourself »
Bar Color
The bar() and barh() takes the keyword argument color to set the color of the bars:
Example
Draw 4 red bars:
Result:
Try it Yourself »
Color Names
You can use any of the 140 supported color names.
Example
Draw 4 "hot pink" bars:
Result:
Try it Yourself »
Color Hex
Or you can use Hexadecimal color values:
Example
Draw 4 bars with a beautiful green color:
Result:
Try it Yourself »
Bar Width
The bar() takes the keyword argument width to set the width of the bars:
Example
Draw 4 very thin bars:
Result:
Try it Yourself »
Bar Height
The barh() takes the keyword argument height to set the height of the bars:
Example
Draw 4 very thin bars:
Result:
Try it Yourself »
❮ PreviousNext ❯
Matplotlib Histograms
❮ PreviousNext ❯
Histogram
A histogram is a graph showing frequency distributions.
Example: Say you ask for the height of 250 people, you might end up with a histogram
like this:
You can read from the histogram that there are approximately:
Create Histogram
In Matplotlib, we use the hist() function to create histograms.
The hist() function will use an array of numbers to create a histogram, the array is sent
into the function as an argument.
For simplicity we use NumPy to randomly generate an array with 250 values, where the
values will concentrate around 170, and the standard deviation is 10. Learn more
about Normal Data Distribution in our Machine Learning Tutorial.
Example
A Normal Data Distribution by NumPy:
import numpy as np
print(x)
Result:
This will generate a random result, and could look like this:
Try it Yourself »
The hist() function will read the array and produce a histogram:
Example
A simple histogram:
plt.hist(x)
plt.show()
Result:
Try it Yourself »
❮ PreviousNext ❯
Example
A simple pie chart:
plt.pie(y)
plt.show()
Result:
Try it Yourself »
As you can see the pie chart draws one piece (called a wedge) for each value in the array
(in this case [35, 25, 25, 15]).
By default the plotting of the first wedge starts from the x-axis and
move counterclockwise:
Note: The size of each wedge is determined by comparing the value with all the other
values, by using this formula:
Labels
Add labels to the pie chart with the label parameter.
The label parameter must be an array with one label for each wedge:
Example
A simple pie chart:
Result:
Try it Yourself »
Start Angle
As mentioned the default start angle is at the x-axis, but you can change the start angle
by specifying a startangle parameter.
Result:
Try it Yourself »
Explode
Maybe you want one of the wedges to stand out? The explode parameter allows you to do
that.
The explode parameter, if specified, and not None, must be an array with one value for
each wedge.
Each value represents how far from the center each wedge is displayed:
Example
Pull the "Apples" wedge 0.2 from the center of the pie:
Result:
Try it Yourself »
Shadow
Add a shadow to the pie chart by setting the shadows parameter to True:
Example
Add a shadow:
Try it Yourself »
Colors
You can set the color of each wedge with the colors parameter.
The colors parameter, if specified, must be an array with one value for each wedge:
Example
Specify a new color for each wedge:
Try it Yourself »
You can use Hexadecimal color values, any of the 140 supported color names, or one of
these shortcuts:
'r' - Red
'g' - Green
'b' - Blue
'c' - Cyan
'm' - Magenta
'y' - Yellow
'k' - Black
'w' - White
Legend
To add a list of explanation for each wedge, use the legend() function:
Example
Add a legend:
import matplotlib.pyplot as plt
import numpy as np
Result:
Try it Yourself »
Example
Add a legend with a header:
Result:
Try it Yourself »
❮ PreviousNext ❯
Machine Learning
❮ PreviousNext ❯
Machine Learning is making the computer learn from studying data and statistics.
Machine Learning is a program that analyses data and learns to predict the outcome.
Where To Start?
In this tutorial we will go back to mathematics and study statistics, and how to calculate
important numbers based on data sets.
We will also learn how to use various Python modules to get the answers we need.
And we will learn how to make functions that are able to predict the outcome based on
what we have learned.
Data Set
In the mind of a computer, a data set is any collection of data. It can be anything from
an array to a complete database.
Example of an array:
[99,86,87,88,111,86,103,87,94,78,77,85,86]
Example of a database:
BMW red 5 99 Y
Volvo black 7 86 Y
VW gray 8 87 N
VW white 7 88 Y
VW white 17 86 Y
BMW black 9 87 Y
Volvo gray 4 94 N
Ford white 11 78 N
Toyota gray 12 77 N
VW white 9 85 N
Toyota blue 6 86 Y
By looking at the array, we can guess that the average value is probably around 80 or
90, and we are also able to determine the highest value and the lowest value, but what
else can we do?
And by looking at the database we can see that the most popular color is white, and the
oldest car is 17 years, but what if we could predict if a car had an AutoPass, just by
looking at the other values?
That is what Machine Learning is for! Analyzing data and predicting the outcome!
In Machine Learning it is common to work with very large data sets. In this tutorial we
will try to make it as easy as possible to understand the different concepts of machine
learning, and we will work with small easy-to-understand data sets.
Data Types
To analyze data, it is important to know what type of data we are dealing with.
• Numerical
• Categorical
• Ordinal
Numerical data are numbers, and can be split into two numerical categories:
• Discrete Data
- numbers that are limited to integers. Example: The number of cars passing by.
• Continuous Data
- numbers that are of infinite value. Example: The price of an item, or the size of
an item
Categorical data are values that cannot be measured up against each other. Example: a
color value, or any yes/no values.
Ordinal data are like categorical data, but can be measured up against each other.
Example: school grades where A is better than B and so on.
By knowing the data type of your data source, you will be able to know what technique to
use when analyzing them.
You will learn more about statistics and analyzing data in the next chapters.
❮ PreviousNext ❯
Machine Learning - Mean Median
Mode
❮ PreviousNext ❯
In Machine Learning (and in mathematics) there are often three values that interests us:
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
What is the average, the middle, or the most common speed value?
Mean
The mean value is the average value.
To calculate the mean, find the sum of all values, and divide the sum by the number of
values:
(99+86+87+88+111+86+103+87+94+78+77+85+86) / 13 = 89.77
The NumPy module has a method for this. Learn about the NumPy module in our NumPy
Tutorial.
Example
Use the NumPy mean() method to find the average speed:
import numpy
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
x = numpy.mean(speed)
print(x)
Run example »
Median
The median value is the value in the middle, after you have sorted all the values:
77, 78, 85, 86, 86, 86, 87, 87, 88, 94, 99, 103, 111
It is important that the numbers are sorted before you can find the median.
Example
Use the NumPy median() method to find the middle value:
import numpy
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
x = numpy.median(speed)
print(x)
Try it Yourself »
If there are two numbers in the middle, divide the sum of those numbers by two.
77, 78, 85, 86, 86, 86, 87, 87, 94, 98, 99, 103
Example
Using the NumPy module:
import numpy
speed = [99,86,87,88,86,103,87,94,78,77,85,86]
x = numpy.median(speed)
print(x)
Try it Yourself »
Mode
The Mode value is the value that appears the most number of times:
99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86 = 86
The SciPy module has a method for this. Learn about the SciPy module in our SciPy
Tutorial.
Example
Use the SciPy mode() method to find the number that appears the most:
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
x = stats.mode(speed)
print(x)
Try it Yourself »
Chapter Summary
The Mean, Median, and Mode are techniques that are often used in Machine Learning, so
it is important to understand the concept behind them.
❮ PreviousNext ❯
Machine Learning - Standard
Deviation
❮ PreviousNext ❯
A low standard deviation means that most of the numbers are close to the mean
(average) value.
A high standard deviation means that the values are spread out over a wider range.
speed = [86,87,88,86,87,85,86]
0.9
Meaning that most of the values are within the range of 0.9 from the mean value, which
is 86.4.
speed = [32,111,138,28,59,77,97]
37.85
Meaning that most of the values are within the range of 37.85 from the mean value,
which is 77.4.
As you can see, a higher standard deviation indicates that the values are spread out over
a wider range.
Example
Use the NumPy std() method to find the standard deviation:
import numpy
speed = [86,87,88,86,87,85,86]
x = numpy.std(speed)
print(x)
Try it Yourself »
Example
import numpy
speed = [32,111,138,28,59,77,97]
x = numpy.std(speed)
print(x)
Try it Yourself »
Variance
Variance is another number that indicates how spread out the values are.
In fact, if you take the square root of the variance, you get the standard deviation!
Or the other way around, if you multiply the standard deviation by itself, you get the
variance!
(32+111+138+28+59+77+97) / 7 = 77.4
32 - 77.4 = -45.4
111 - 77.4 = 33.6
138 - 77.4 = 60.6
28 - 77.4 = -49.4
59 - 77.4 = -18.4
77 - 77.4 = - 0.4
97 - 77.4 = 19.6
(2061.16+1128.96+3672.36+2440.36+338.56+0.16+384.16) / 7 = 1432.2
Example
Use the NumPy var() method to find the variance:
import numpy
speed = [32,111,138,28,59,77,97]
x = numpy.var(speed)
print(x)
Try it Yourself »
Standard Deviation
As we have learned, the formula to find the standard deviation is the square root of the
variance:
√1432.25 = 37.85
Or, as in the example from before, use the NumPy to calculate the standard deviation:
Example
Use the NumPy std() method to find the standard deviation:
import numpy
speed = [32,111,138,28,59,77,97]
x = numpy.std(speed)
print(x)
Try it Yourself »
Symbols
Standard Deviation is often represented by the symbol Sigma: σ
Variance is often represented by the symbol Sigma Square: σ 2
Chapter Summary
The Standard Deviation and Variance are terms that are often used in Machine Learning,
so it is important to understand how to get them, and the concept behind them.
❮ PreviousNext ❯
Example: Let's say we have an array of the ages of all the people that lives in a street.
ages = [5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]
What is the 75. percentile? The answer is 43, meaning that 75% of the people are 43 or
younger.
The NumPy module has a method for finding the specified percentile:
Example
Use the NumPy percentile() method to find the percentiles:
import numpy
ages = [5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]
x = numpy.percentile(ages, 75)
print(x)
Try it Yourself »
Example
What is the age that 90% of the people are younger than?
import numpy
ages = [5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]
x = numpy.percentile(ages, 90)
print(x)
Try it Yourself »
❮ PreviousNext ❯
Data Distribution
Earlier in this tutorial we have worked with very small amounts of data in our examples,
just to understand the different concepts.
In the real world, the data sets are much bigger, but it can be difficult to gather real
world data, at least at an early stage of a project.
Example
Create an array containing 250 random floats between 0 and 5:
import numpy
print(x)
Try it Yourself »
Histogram
To visualize the data set we can draw a histogram with the data we collected.
Example
Draw a histogram:
import numpy
import matplotlib.pyplot as plt
plt.hist(x, 5)
plt.show()
Result:
Run example »
Histogram Explained
We use the array from the example above to draw a histogram with 5 bars.
The first bar represents how many values in the array are between 0 and 1.
The second bar represents how many values are between 1 and 2.
Etc.
Note: The array values are random numbers and will not show the exact same result on
your computer.
Example
Create an array with 100000 random numbers, and display them using a histogram with
100 bars:
import numpy
import matplotlib.pyplot as plt
plt.hist(x, 100)
plt.show()
Run example »
❮ PreviousNext ❯
Machine Learning - Normal Data
Distribution
❮ PreviousNext ❯
In this chapter we will learn how to create an array where the values are concentrated
around a given value.
In probability theory this kind of data distribution is known as the normal data
distribution, or the Gaussian data distribution, after the mathematician Carl Friedrich
Gauss who came up with the formula of this data distribution.
Example
A typical normal data distribution:
import numpy
import matplotlib.pyplot as plt
plt.hist(x, 100)
plt.show()
Result:
Run example »
Note: A normal distribution graph is also known as the bell curve because of it's
characteristic shape of a bell.
Histogram Explained
We use the array from the numpy.random.normal() method, with 100000 values, to draw a
histogram with 100 bars.
We specify that the mean value is 5.0, and the standard deviation is 1.0.
Meaning that the values should be concentrated around 5.0, and rarely further away than
1.0 from the mean.
And as you can see from the histogram, most values are between 4.0 and 6.0, with a top
at approximately 5.0.
❮ PreviousNext ❯
Machine Learning - Scatter Plot
❮ PreviousNext ❯
Scatter Plot
A scatter plot is a diagram where each value in the data set is represented by a dot.
The Matplotlib module has a method for drawing scatter plots, it needs two arrays of the
same length, one for the values of the x-axis, and one for the values of the y-axis:
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
Example
Use the scatter() method to draw a scatter plot diagram:
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
plt.scatter(x, y)
plt.show()
Result:
Run example »
What we can read from the diagram is that the two fastest cars were both 2 years old,
and the slowest car was 12 years old.
Note: It seems that the newer the car, the faster it drives, but that could be a
coincidence, after all we only registered 13 cars.
Random Data Distributions
In Machine Learning the data sets can contain thousands-, or even millions, of values.
You might not have real world data when you are testing an algorithm, you might have to
use randomly generated values.
As we have learned in the previous chapter, the NumPy module can help us with that!
Let us create two arrays that are both filled with 1000 random numbers from a normal
data distribution.
The first array will have the mean set to 5.0 with a standard deviation of 1.0.
The second array will have the mean set to 10.0 with a standard deviation of 2.0:
Example
A scatter plot with 1000 dots:
import numpy
import matplotlib.pyplot as plt
plt.scatter(x, y)
plt.show()
Result:
Run example »
We can also see that the spread is wider on the y-axis than on the x-axis.
❮ PreviousNext ❯
Machine Learning - Linear
Regression
❮ PreviousNext ❯
Regression
The term regression is used when you try to find the relationship between variables.
In Machine Learning, and in statistical modeling, that relationship is used to predict the outcome of future events.
Linear Regression
Linear regression uses the relationship between the data-points to draw a straight line through all them.
In the example below, the x-axis represents age, and the y-axis represents speed. We have registered the age and
speed of 13 cars as they were passing a tollbooth. Let us see if the data we collected could be used in a linear
regression:
Example
Start by drawing a scatter plot:
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
plt.scatter(x, y)
plt.show()
Result:
Run example »
Example
Import scipy and draw the line of Linear Regression:
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
def myfunc(x):
return slope * x + intercept
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
Result:
Run example »
Example Explained
Import the modules you need.
You can learn about the Matplotlib module in our Matplotlib Tutorial.
You can learn about the SciPy module in our SciPy Tutorial.
Create the arrays that represent the values of the x and y axis:
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
Execute a method that returns some important key values of Linear Regression:
Create a function that uses the slope and intercept values to return a new value. This new value represents
where on the y-axis the corresponding x value will be placed:
def myfunc(x):
return slope * x + intercept
Run each value of the x array through the function. This will result in a new array with new values for the y-axis:
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
R for Relationship
It is important to know how the relationship between the values of the x-axis and the values of the y-axis is, if
there are no relationship the linear regression can not be used to predict anything.
The r value ranges from -1 to 1, where 0 means no relationship, and 1 (and -1) means 100% related.
Python and the Scipy module will compute this value for you, all you have to do is feed it with the x and y
values.
Example
How well does my data fit in a linear regression?
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
print(r)
Try it Yourself »
Note: The result -0.76 shows that there is a relationship, not perfect, but it indicates that we could use linear
regression in future predictions.
To do so, we need the same myfunc() function from the example above:
def myfunc(x):
return slope * x + intercept
Example
Predict the speed of a 10 years old car:
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
def myfunc(x):
return slope * x + intercept
speed = myfunc(10)
print(speed)
Run example »
The example predicted a speed at 85.6, which we also could read from the diagram:
Bad Fit?
Let us create an example where linear regression would not be the best method to predict future values.
Example
These values for the x- and y-axis should result in a very bad fit for linear regression:
x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]
def myfunc(x):
return slope * x + intercept
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
Result:
Run example »
Example
You should get a very low r value.
import numpy
from scipy import stats
x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]
print(r)
Try it Yourself »
The result: 0.013 indicates a very bad relationship, and tells us that this data set is not suitable for linear
regression.
❮ PreviousNext ❯
Polynomial Regression
If your data points clearly will not fit a linear regression (a straight line through all data
points), it might be ideal for polynomial regression.
Polynomial regression, like linear regression, uses the relationship between the variables
x and y to find the best way to draw a line through the data points.
How Does it Work?
Python has methods for finding a relationship between data-points and to draw a line of
polynomial regression. We will show you how to use these methods instead of going
through the mathematic formula.
In the example below, we have registered 18 cars as they were passing a certain
tollbooth.
We have registered the car's speed, and the time of day (hour) the passing occurred.
The x-axis represents the hours of the day and the y-axis represents the speed:
Example
Start by drawing a scatter plot:
x = [1,2,3,5,6,7,8,9,10,12,13,14,15,16,18,19,21,22]
y = [100,90,80,60,60,55,60,65,70,70,75,76,78,79,90,99,99,100]
plt.scatter(x, y)
plt.show()
Result:
Run example »
Example
Import numpy and matplotlib then draw the line of Polynomial Regression:
import numpy
import matplotlib.pyplot as plt
x = [1,2,3,5,6,7,8,9,10,12,13,14,15,16,18,19,21,22]
y = [100,90,80,60,60,55,60,65,70,70,75,76,78,79,90,99,99,100]
plt.scatter(x, y)
plt.plot(myline, mymodel(myline))
plt.show()
Result:
Run example »
Example Explained
Import the modules you need.
You can learn about the NumPy module in our NumPy Tutorial.
You can learn about the SciPy module in our SciPy Tutorial.
import numpy
import matplotlib.pyplot as plt
Create the arrays that represent the values of the x and y axis:
x = [1,2,3,5,6,7,8,9,10,12,13,14,15,16,18,19,21,22]
y = [100,90,80,60,60,55,60,65,70,70,75,76,78,79,90,99,99,100]
Then specify how the line will display, we start at position 1, and end at position 22:
plt.scatter(x, y)
plt.plot(myline, mymodel(myline))
plt.show()
R-Squared
It is important to know how well the relationship between the values of the x- and y-axis
is, if there are no relationship the polynomial regression can not be used to predict
anything.
The r-squared value ranges from 0 to 1, where 0 means no relationship, and 1 means
100% related.
Python and the Sklearn module will compute this value for you, all you have to do is feed
it with the x and y arrays:
Example
How well does my data fit in a polynomial regression?
import numpy
from sklearn.metrics import r2_score
x = [1,2,3,5,6,7,8,9,10,12,13,14,15,16,18,19,21,22]
y = [100,90,80,60,60,55,60,65,70,70,75,76,78,79,90,99,99,100]
print(r2_score(y, mymodel(x)))
Try if Yourself »
Note: The result 0.94 shows that there is a very good relationship, and we can use
polynomial regression in future predictions.
Example: Let us try to predict the speed of a car that passes the tollbooth at around 17
P.M:
To do so, we need the same mymodel array from the example above:
Example
Predict the speed of a car passing at 17 P.M:
import numpy
from sklearn.metrics import r2_score
x = [1,2,3,5,6,7,8,9,10,12,13,14,15,16,18,19,21,22]
y = [100,90,80,60,60,55,60,65,70,70,75,76,78,79,90,99,99,100]
speed = mymodel(17)
print(speed)
Run example »
The example predicted a speed to be 88.87, which we also could read from the diagram:
Bad Fit?
Let us create an example where polynomial regression would not be the best method to
predict future values.
Example
These values for the x- and y-axis should result in a very bad fit for polynomial
regression:
import numpy
import matplotlib.pyplot as plt
x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]
plt.scatter(x, y)
plt.plot(myline, mymodel(myline))
plt.show()
Result:
Run example »
Example
You should get a very low r-squared value.
import numpy
from sklearn.metrics import r2_score
x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]
print(r2_score(y, mymodel(x)))
Try if Yourself »
The result: 0.00995 indicates a very bad relationship, and tells us that this data set is not
suitable for polynomial regression.
❮ PreviousNext ❯
Multiple Regression
Multiple regression is like linear regression, but with more than one independent value,
meaning that we try to predict a value based on two or more variables.
Take a look at the data set below, it contains some information about cars.
We can predict the CO2 emission of a car based on the size of the engine, but with
multiple regression we can throw in more variables, like the weight of the car, to make
the prediction more accurate.
import pandas
Learn about the Pandas module in our Pandas Tutorial.
The Pandas module allows us to read csv files and return a DataFrame object.
The file is meant for testing purposes only, you can download it here: cars.csv
df = pandas.read_csv("cars.csv")
Then make a list of the independent values and call this variable X.
X = df[['Weight', 'Volume']]
y = df['CO2']
Tip: It is common to name the list of independent values with a upper case X, and the
list of dependent values with a lower case y.
We will use some methods from the sklearn module, so we will have to import that
module as well:
From the sklearn module we will use the LinearRegression() method to create a linear
regression object.
This object has a method called fit() that takes the independent and dependent values
as parameters and fills the regression object with data that describes the relationship:
regr = linear_model.LinearRegression()
regr.fit(X, y)
Now we have a regression object that are ready to predict CO2 values based on a car's
weight and volume:
#predict the CO2 emission of a car where the weight is 2300kg, and the
volume is 1300cm3:
predictedCO2 = regr.predict([[2300, 1300]])
Example
See the whole example in action:
import pandas
from sklearn import linear_model
df = pandas.read_csv("cars.csv")
X = df[['Weight', 'Volume']]
y = df['CO2']
regr = linear_model.LinearRegression()
regr.fit(X, y)
#predict the CO2 emission of a car where the weight is 2300kg, and the volume is
1300cm3:
predictedCO2 = regr.predict([[2300, 1300]])
print(predictedCO2)
Result:
[107.2087328]
Run example »
We have predicted that a car with 1.3 liter engine, and a weight of 2300 kg, will release
approximately 107 grams of CO2 for every kilometer it drives.
Coefficient
The coefficient is a factor that describes the relationship with an unknown variable.
Example: if x is a variable, then 2x is x two times. x is the unknown variable, and the
number 2 is the coefficient.
In this case, we can ask for the coefficient value of weight against CO2, and for volume
against CO2. The answer(s) we get tells us what would happen if we increase, or
decrease, one of the independent values.
Example
Print the coefficient values of the regression object:
import pandas
from sklearn import linear_model
df = pandas.read_csv("cars.csv")
X = df[['Weight', 'Volume']]
y = df['CO2']
regr = linear_model.LinearRegression()
regr.fit(X, y)
print(regr.coef_)
Result:
[0.00755095 0.00780526]
Run example »
Result Explained
The result array represents the coefficient values of weight and volume.
Weight: 0.00755095
Volume: 0.00780526
These values tell us that if the weight increase by 1kg, the CO2 emission increases by
0.00755095g.
And if the engine size (Volume) increases by 1 cm3, the CO2 emission increases by
0.00780526 g.
We have already predicted that if a car with a 1300cm3 engine weighs 2300kg, the CO2
emission will be approximately 107g.
Example
Copy the example from before, but change the weight from 2300 to 3300:
import pandas
from sklearn import linear_model
df = pandas.read_csv("cars.csv")
X = df[['Weight', 'Volume']]
y = df['CO2']
regr = linear_model.LinearRegression()
regr.fit(X, y)
print(predictedCO2)
Result:
[114.75968007]
Run example »
We have predicted that a car with 1.3 liter engine, and a weight of 3300 kg, will release
approximately 115 grams of CO2 for every kilometer it drives.
Scale Features
When your data has different values, and even different measurement units, it can be
difficult to compare them. What is kilograms compared to meters? Or altitude compared
to time?
The answer to this problem is scaling. We can scale data into new values that are easier
to compare.
Take a look at the table below, it is the same data set that we used in the multiple
regression chapter, but this time the volume column contains values in liters instead
of cm3 (1.0 instead of 1000).
The file is meant for testing purposes only, you can download it here: cars2.csv
It can be difficult to compare the volume 1.0 with the weight 790, but if we scale them
both into comparable values, we can easily see how much one value is compared to the
other.
There are different methods for scaling data, in this tutorial we will use a method called
standardization.
z = (x - u) / s
Where z is the new value, x is the original value, u is the mean and s is the standard
deviation.
If you take the weight column from the data set above, the first value is 790, and the
scaled value will be:
If you take the volume column from the data set above, the first value is 1.0, and the
scaled value will be:
Now you can compare -2.1 with -1.59 instead of comparing 790 with 1.0.
You do not have to do this manually, the Python sklearn module has a method
called StandardScaler() which returns a Scaler object with methods for transforming data
sets.
Example
Scale all values in the Weight and Volume columns:
import pandas
from sklearn import linear_model
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
df = pandas.read_csv("cars2.csv")
X = df[['Weight', 'Volume']]
scaledX = scale.fit_transform(X)
print(scaledX)
Result:
Note that the first two values are -2.1 and -1.59, which corresponds to our calculations:
[[-2.10389253 -1.59336644]
[-0.55407235 -1.07190106]
[-1.52166278 -1.59336644]
[-1.78973979 -1.85409913]
[-0.63784641 -0.28970299]
[-1.52166278 -1.59336644]
[-0.76769621 -0.55043568]
[ 0.3046118 -0.28970299]
[-0.7551301 -0.28970299]
[-0.59595938 -0.0289703 ]
[-1.30803892 -1.33263375]
[-1.26615189 -0.81116837]
[-0.7551301 -1.59336644]
[-0.16871166 -0.0289703 ]
[ 0.14125238 -0.0289703 ]
[ 0.15800719 -0.0289703 ]
[ 0.3046118 -0.0289703 ]
[-0.05142797 1.53542584]
[-0.72580918 -0.0289703 ]
[ 0.14962979 1.01396046]
[ 1.2219378 -0.0289703 ]
[ 0.5685001 1.01396046]
[ 0.3046118 1.27469315]
[ 0.51404696 -0.0289703 ]
[ 0.51404696 1.01396046]
[ 0.72348212 -0.28970299]
[ 0.8281997 1.01396046]
[ 1.81254495 1.01396046]
[ 0.96642691 -0.0289703 ]
[ 1.72877089 1.01396046]
[ 1.30990057 1.27469315]
[ 1.90050772 1.01396046]
[-0.23991961 -0.0289703 ]
[ 0.40932938 -0.0289703 ]
[ 0.47215993 -0.0289703 ]
[ 0.4302729 2.31762392]]
Run example »
When the data set is scaled, you will have to use the scale when you predict values:
Example
Predict the CO2 emission from a 1.3 liter car that weighs 2300 kilograms:
import pandas
from sklearn import linear_model
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
df = pandas.read_csv("cars2.csv")
X = df[['Weight', 'Volume']]
y = df['CO2']
scaledX = scale.fit_transform(X)
regr = linear_model.LinearRegression()
regr.fit(scaledX, y)
scaled = scale.transform([[2300, 1.3]])
predictedCO2 = regr.predict([scaled[0]])
print(predictedCO2)
Result:
[107.2087328]
Run example »
❮ PreviousNext ❯
To measure if the model is good enough, we can use a method called Train/Test.
What is Train/Test
Train/Test is a method to measure the accuracy of your model.
It is called Train/Test because you split the the data set into two sets: a training set and
a testing set.
Our data set illustrates 100 customers in a shop, and their shopping habits.
Example
import numpy
import matplotlib.pyplot as plt
numpy.random.seed(2)
x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x
plt.scatter(x, y)
plt.show()
Result:
The x axis represents the number of minutes before making a purchase.
Run example »
Split Into Train/Test
The training set should be a random selection of 80% of the original data.
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
Example
plt.scatter(train_x, train_y)
plt.show()
Result:
It looks like the original data set, so it seems to be a fair selection:
Run example »
Example
plt.scatter(test_x, test_y)
plt.show()
Result:
The testing set also looks like the original data set:
Run example »
To draw a line through the data points, we use the plot() method of the matplotlib
module:
Example
Draw a polynomial regression line through the data points:
import numpy
import matplotlib.pyplot as plt
numpy.random.seed(2)
x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
plt.scatter(train_x, train_y)
plt.plot(myline, mymodel(myline))
plt.show()
Result:
Run example »
The result can back my suggestion of the data set fitting a polynomial regression, even
though it would give us some weird results if we try to predict values outside of the data
set. Example: the line indicates that a customer spending 6 minutes in the shop would
make a purchase worth 200. That is probably a sign of overfitting.
But what about the R-squared score? The R-squared score is a good indicator of how well
my data set is fitting the model.
R2
Remember R2, also known as R-squared?
It measures the relationship between the x axis and the y axis, and the value ranges
from 0 to 1, where 0 means no relationship, and 1 means totally related.
The sklearn module has a method called r2_score() that will help us find this
relationship.
In this case we would like to measure the relationship between the minutes a customer
stays in the shop and how much money they spend.
Example
How well does my training data fit in a polynomial regression?
import numpy
from sklearn.metrics import r2_score
numpy.random.seed(2)
x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
r2 = r2_score(train_y, mymodel(train_x))
print(r2)
Try it Yourself »
Now we want to test the model with the testing data as well, to see if gives us the same
result.
Example
Let us find the R2 score when using testing data:
import numpy
from sklearn.metrics import r2_score
numpy.random.seed(2)
x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
r2 = r2_score(test_y, mymodel(test_x))
print(r2)
Try it Yourself »
Note: The result 0.809 shows that the model fits the testing set as well, and we are
confident that we can use the model to predict future values.
Predict Values
Now that we have established that our model is OK, we can start predicting new values.
Example
How much money will a buying customer spend, if she or he stays in the shop for 5
minutes?
print(mymodel(5))
Run example »
The example predicted the customer to spend 22.88 dollars, as seems to correspond to
the diagram:
❮ PreviousNext ❯
Machine Learning - Decision Tree
❮ PreviousNext ❯
Decision Tree
In this chapter we will show you how to make a "Decision Tree". A Decision Tree is a Flow
Chart, and can help you make decisions based on previous experience.
In the example, a person will try to decide if he/she should go to a comedy show or not.
Luckily our example person has registered every time there was a comedy show in town,
and registered some information about the comedian, and also registered if he/she went
or not.
36 10 9 UK NO
42 12 4 USA NO
23 4 6 N NO
52 4 4 USA NO
43 21 8 USA YES
44 14 5 UK NO
66 3 7 N YES
35 14 9 UK YES
52 13 7 N YES
35 5 9 N YES
24 3 5 USA NO
18 3 7 UK YES
45 9 9 UK YES
Now, based on this data set, Python can create a decision tree that can be used to decide
if any new shows are worth attending to.
Example
Read and print the data set:
import pandas
from sklearn import tree
import pydotplus
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
import matplotlib.image as pltimg
df = pandas.read_csv("shows.csv")
print(df)
Run example »
We have to convert the non numerical columns 'Nationality' and 'Go' into numerical
values.
Pandas has a map() method that takes a dictionary with information on how to convert
the values.
Example
Change string values into numerical values:
print(df)
Run example »
Then we have to separate the feature columns from the target column.
The feature columns are the columns that we try to predict from, and the target column
is the column with the values we try to predict.
Example
X is the feature columns, y is the target column:
X = df[features]
y = df['Go']
print(X)
print(y)
Run example »
Now we can create the actual decision tree, fit it with our details, and save a .png file on
the computer:
Example
Create a Decision Tree, save it as an image, and show the image:
dtree = DecisionTreeClassifier()
dtree = dtree.fit(X, y)
data = tree.export_graphviz(dtree, out_file=None, feature_names=features)
graph = pydotplus.graph_from_dot_data(data)
graph.write_png('mydecisiontree.png')
img=pltimg.imread('mydecisiontree.png')
imgplot = plt.imshow(img)
plt.show()
Run example »
Result Explained
The decision tree uses your earlier decisions to calculate the odds for you to wanting to
go see a comedian or not.
Rank
Rank <= 6.5 means that every comedian with a rank of 6.5 or lower will follow
the True arrow (to the left), and the rest will follow the False arrow (to the right).
gini = 0.497 refers to the quality of the split, and is always a number between 0.0 and
0.5, where 0.0 would mean all of the samples got the same result, and 0.5 would mean
that the split is done exactly in the middle.
samples = 13 means that there are 13 comedians left at this point in the decision, which
is all of them since this is the first step.
value = [6, 7] means that of these 13 comedians, 6 will get a "NO", and 7 will get a
"GO".
Gini
There are many ways to split the samples, we use the GINI method in this tutorial.
samples = 5 means that there are 5 comedians left in this branch (5 comedian with a
Rank of 6.5 or lower).
value = [5, 0] means that 5 will get a "NO" and 0 will get a "GO".
gini = 0.219 means that about 22% of the samples would go in one direction.
samples = 8 means that there are 8 comedians left in this branch (8 comedian with a
Rank higher than 6.5).
value = [1, 7] means that of these 8 comedians, 1 will get a "NO" and 7 will get a
"GO".
True - 4 Comedians Continue:
Age
Age <= 35.5 means that comedians at the age of 35.5 or younger will follow the arrow to
the left, and the rest will follow the arrow to the right.
gini = 0.375 means that about 37,5% of the samples would go in one direction.
samples = 4 means that there are 4 comedians left in this branch (4 comedians from the
UK).
value = [1, 3] means that of these 4 comedians, 1 will get a "NO" and 3 will get a
"GO".
samples = 4 means that there are 4 comedians left in this branch (4 comedians not from
the UK).
value = [0, 4] means that of these 4 comedians, 0 will get a "NO" and 4 will get a
"GO".
True - 2 Comedians End Here:
gini = 0.0 means all of the samples got the same result.
samples = 2 means that there are 2 comedians left in this branch (2 comedians at the
age 35.5 or younger).
value = [0, 2] means that of these 2 comedians, 0 will get a "NO" and 2 will get a
"GO".
gini = 0.5 means that 50% of the samples would go in one direction.
samples = 2 means that there are 2 comedians left in this branch (2 comedians older
than 35.5).
value = [1, 1] means that of these 2 comedians, 1 will get a "NO" and 1 will get a
"GO".
True - 1 Comedian Ends Here:
gini = 0.0 means all of the samples got the same result.
samples = 1 means that there is 1 comedian left in this branch (1 comedian with 9.5
years of experience or less).
value = [0, 1] means that 0 will get a "NO" and 1 will get a "GO".
samples = 1 means that there is 1 comedians left in this branch (1 comedian with more
than 9.5 years of experience).
value = [1, 0] means that 1 will get a "NO" and 0 will get a "GO".
Predict Values
We can use the Decision Tree to predict new values.
Example: Should I go see a show starring a 40 years old American comedian, with 10
years of experience, and a comedy ranking of 7?
Example
Use predict() method to predict new values:
Run example »
Example
What would the answer be if the comedy rank was 6?
print(dtree.predict([[40, 10, 6, 1]]))
Run example »
Different Results
You will see that the Decision Tree gives you different results if you run it enough times,
even if you feed it with the same data.
That is because the Decision Tree does not give us a 100% certain answer. It is based on
the probability of an outcome, and the answer will vary.
❮ PreviousNext ❯
On this page, W3schools.com collaborates with NYC Data Science Academy, to deliver
digital training content to our students.
The rows represent the actual classes the outcomes should have been. While the columns
represent the predictions we have made. Using this table it is easy to see which
predictions are wrong.
import numpy
Next we will need to generate the numbers for "actual" and "predicted" values.
In order to create the confusion matrix we need to import metrics from the sklearn
module.
Once metrics is imported we can use the confusion matrix function on our actual and
predicted values.
To create a more interpretable visual display we need to convert the table into a
confusion matrix display.
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix =
confusion_matrix, display_labels = [False, True])
Finally to display the plot we can use the functions plot() and show() from pyplot.
cm_display.plot()
plt.show()
Example
import matplotlib.pyplot as plt
import numpy
from sklearn import metrics
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix =
confusion_matrix, display_labels = [False, True])
cm_display.plot()
plt.show()
Result
Run example »
Results Explained
The Confusion Matrix created has four different quadrants:
True means that the values were accurately predicted, False means that there was an
error or wrong prediction.
Now that we have made a Confusion Matrix, we can calculate different measures to
quantify the quality of the model. First, lets look at Accuracy.
Created Metrics
The matrix provides us with many useful metrics that help us to evaluate out
classification model.
The different measures include: Accuracy, Precision, Sensitivity (Recall), Specificity, and
the F-score, explained below.
Accuracy
Accuracy measures how often the model is correct.
How to Calculate
(True Positive + True Negative) / Total Predictions
Example
Accuracy = metrics.accuracy_score(actual, predicted)
Run example »
Precision
Of the positives predicted, what percentage is truly positive?
How to Calculate
True Positive / (True Positive + False Positive)
Example
Precision = metrics.precision_score(actual, predicted)
Run example »
Sensitivity (Recall)
Of all the positive cases, what percentage are predicted positive?
Sensitivity (sometimes called Recall) measures how good the model is at predicting
positives.
This means it looks at true positives and false negatives (which are positives that have
been incorrectly predicted as negative).
How to Calculate
True Positive / (True Positive + False Negative)
Sensitivity is good at understanding how well the model predicts something is positive:
Example
Sensitivity_recall = metrics.recall_score(actual, predicted)
Run example »
Specificity
How well the model is at prediciting negative results?
Specificity is similar to sensitivity, but looks at it from the persepctive of negative results.
How to Calculate
True Negative / (True Negative + False Positive)
Since it is just the opposite of Recall, we use the recall_score function, taking the
opposite position label:
Example
Specificity = metrics.recall_score(actual, predicted, pos_label=0)
Run example »
F-score
F-score is the "harmonic mean" of precision and sensitivity.
It considers both false positive and false negative cases and is good for imbalanced
datasets.
How to Calculate
2 * ((Precision * Sensitivity) / (Precision + Sensitivity))
This score does not take into consideration the True Negative values:
Example
F1_score = metrics.f1_score(actual, predicted)
Run example »
Example
#metrics
print({"Accuracy":Accuracy,"Precision":Precision,"Sensitivity_recall":Sensitivity_
recall,"Specificity":Specificity,"F1_score":F1_score})
Run example »
❮ PreviousNext ❯
Machine Learning - Hierarchical
Clustering
❮ PreviousNext ❯
On this page, W3schools.com collaborates with NYC Data Science Academy, to deliver
digital training content to our students.
Hierarchical Clustering
Hierarchical clustering is an unsupervised learning method for clustering data points. The
algorithm builds clusters by measuring the dissimilarities between data. Unsupervised
learning means that a model does not have to be trained, and we do not need a "target"
variable. This method can be used on any data to visualize and interpret the relationship
between individual data points.
Here we will use hierarchical clustering to group data points and visualize the clusters
using both a dendrogram and scatter plot.
Example
Start by visualizing some data points:
import numpy as np
import matplotlib.pyplot as plt
Result
Run example »
Now we compute the ward linkage using euclidean distance, and visualize it using a
dendrogram:
Example
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
plt.show()
Result
Run example »
Here, we do the same thing with Python's scikit-learn library. Then, visualize on a 2-
dimensional plot:
Example
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import AgglomerativeClustering
hierarchical_cluster = AgglomerativeClustering(n_clusters=2,
affinity='euclidean', linkage='ward')
labels = hierarchical_cluster.fit_predict(data)
plt.scatter(x, y, c=labels)
plt.show()
Result
Run example »
Example Explained
Import the modules you need.
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
from sklearn.cluster import AgglomerativeClustering
You can learn about the Matplotlib module in our "Matplotlib Tutorial.
You can learn about the SciPy module in our SciPy Tutorial.
NumPy is a library for working with arrays and matricies in Python, you can learn about
the NumPy module in our NumPy Tutorial.
Create arrays that resemble two variables in a dataset. Note that while we only two
variables here, this method will work with any number of variables:
Result:
[(4, 21), (5, 19), (10, 24), (4, 17), (3, 16), (11, 25), (14, 24), (6, 22),
(10, 21), (12, 21)]
Compute the linkage between all of the different points. Here we use a simple euclidean
distance measure and Ward's linkage, which seeks to minimize the variance between
clusters.
Finally, plot the results in a dendrogram. This plot will show us the hierarchy of clusters
from the bottom (individual points) to the top (a single cluster consisting of all data
points).
plt.show() lets us visualize the dendrogram instead of just the raw linkage data.
dendrogram(linkage_data)
plt.show()
Result:
The .fit_predict method can be called on our data to compute the clusters using the
defined parameters across our chosen number of clusters.
Result:
[0 0 1 0 0 1 1 0 1 1]
Finally, if we plot the same data and color the points using the labels assigned to each
index by the hierarchical clustering method, we can see the cluster each point was
assigned to:
plt.scatter(x, y, c=labels)
plt.show()
Result:
❮ PreviousNext ❯
Machine Learning - Logistic
Regression
❮ PreviousNext ❯
On this page, W3schools.com collaborates with NYC Data Science Academy, to deliver
digital training content to our students.
Logistic Regression
Logistic regression aims to solve classification problems. It does this by predicting
categorical outcomes, unlike linear regression that predicts a continuous outcome.
In the simplest case there are two outcomes, which is called binomial, an example of
which is predicting if a tumor is malignant or benign. Other cases have more than two
outcomes to classify, in this case it is called multinomial. A common example for
multinomial logistic regression would be predicting the class of an iris flower between 3
different species.
Here we will be using basic logistic regression to predict a binomial variable. This means
it has only two possible outcomes.
import numpy
We will use a method from the sklearn module, so we will have to import that module as
well:
From the sklearn module we will use the LogisticRegression() method to create a logistic
regression object.
This object has a method called fit() that takes the independent and dependent values
as parameters and fills the regression object with data that describes the relationship:
logr = linear_model.LogisticRegression()
logr.fit(X,y)
Now we have a logistic regression object that is ready to whether a tumor is cancerous
based on the tumor size:
Example
See the whole example in action:
import numpy
from sklearn import linear_model
logr = linear_model.LogisticRegression()
logr.fit(X,y)
Result
[0]
Run example »
We have predicted that a tumor with a size of 3.46mm will not be cancerous.
Coefficient
In logistic regression the coefficient is the expected change in log-odds of having the
outcome per unit change in X.
This does not have the most intuitive understanding so let's use it to create something
that makes more sense, odds.
Example
See the whole example in action:
import numpy
from sklearn import linear_model
logr = linear_model.LogisticRegression()
logr.fit(X,y)
log_odds = logr.coef_
odds = numpy.exp(log_odds)
print(odds)
Result
[4.03541657]
Run example »
This tells us that as the size of a tumor increases by 1mm the odds of it being a tumor
increases by 4x.
Probability
The coefficient and intercept values can be used to find the probability that each tumor is
cancerous.
Create a function that uses the model's coefficient and intercept values to return a new
value. This new value represents probability that the given observation is a tumor:
def logit2prob(logr,x):
log_odds = logr.coef_ * x + logr.intercept_
odds = numpy.exp(log_odds)
probability = odds / (1 + odds)
return(probability)
Function Explained
To find the log-odds for each observation, we must first create a formula that looks
similar to the one from linear regression, extracting the coefficient and the intercept.
odds = numpy.exp(log_odds)
Now that we have the odds, we can convert it to probability by dividing it by 1 plus the
odds.
Let us now use the function with what we have learned to find out the probability that
each tumor is cancerous.
Example
See the whole example in action:
import numpy
from sklearn import linear_model
X =
numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.8
8]).reshape(-1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
logr = linear_model.LogisticRegression()
logr.fit(X,y)
print(logit2prob(logr, X))
Result
[[0.60749955]
[0.19268876]
[0.12775886]
[0.00955221]
[0.08038616]
[0.07345637]
[0.88362743]
[0.77901378]
[0.88924409]
[0.81293497]
[0.57719129]
[0.96664243]]
Run example »
Results Explained
3.78 0.61 The probability that a tumor with the size 3.78cm is cancerous is 61%.
2.44 0.19 The probability that a tumor with the size 2.44cm is cancerous is 19%.
2.09 0.13 The probability that a tumor with the size 2.09cm is cancerous is 13%.
❮ PreviousNext ❯
Machine Learning - Grid Search
❮ PreviousNext ❯
On this page, W3schools.com collaborates with NYC Data Science Academy, to deliver
digital training content to our students.
Grid Search
The majority of machine learning models contain parameters that can be adjusted to vary
how the model learns. For example, the logistic regression model, from sklearn, has a
parameter C that controls regularization,which affects the complexity of the model.
How do we pick the best value for C? The best value is dependent on the data used to
train the model.
Before we get into the example it is good to know what the parameter we are changing
does. Higher values of C tell the model, the training data resembles real world
information, place a greater weight on the training data. While lower values of C do the
opposite.
To get started we must first load in the dataset we will be working with.
X = iris['data']
y = iris['target']
Now we will load the logistic model for classifying the iris flowers.
Creating the model, setting max_iter to a higher value to ensure that the model finds a
result.
Keep in mind the default value for C in a logistic regression model is 1, we will compare
this later.
In the example below, we look at the iris data set and try to train a model with varying
values for C in logistic regression.
After we create the model, we must fit the model to the data.
print(logit.fit(X,y))
print(logit.score(X,y))
Example
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
iris = datasets.load_iris()
X = iris['data']
y = iris['target']
print(logit.fit(X,y))
print(logit.score(X,y))
Run example »
Let's see if we can do any better by implementing a grid search with difference values of
0.973.
Implementing Grid Search
We will follow the same steps of before except this time we will set a range of values
for C.
Knowing which values to set for the searched parameters will take a combination of
domain knowledge and practice.
Since the default value for C is 1, we will set a range of values surrounding it.
Next we will create a for loop to change out the values of C and evaluate the model with
each change.
scores = []
To change the values of C we must loop over the range of values and update the
parameter each time.
for choice in C:
logit.set_params(C=choice)
logit.fit(X, y)
scores.append(logit.score(X, y))
With the scores stored in a list, we can evaluate what the best choice of C is.
print(scores)
Example
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
iris = datasets.load_iris()
X = iris['data']
y = iris['target']
scores = []
for choice in C:
logit.set_params(C=choice)
logit.fit(X, y)
scores.append(logit.score(X, y))
print(scores)
Run example »
Results Explained
We can see that the lower values of C performed worse than the base parameter of 1.
However, as we increased the value of C to 1.75 the model experienced increased
accuracy.
It seems that increasing C beyond this amount does not help increase model accuracy.
To avoid being misled by the scores on the training data, we can put aside a portion of
our data and use it specifically for the purpose of testing the model. Refer to the lecture
on train/test splitting to avoid being misled and overfitting.
❮ PreviousNext ❯
On this page, W3schools.com collaborates with NYC Data Science Academy, to deliver
digital training content to our students.
Categorical Data
When your data has categories represented by strings, it will be difficult to use them to
train machine learning models which often only accepts numeric data.
Instead of ignoring the categorical data and excluding the information from our model,
you can tranform the data so it can be used in your models.
Take a look at the table below, it is the same data set that we used in the multiple
regression chapter.
Example
import pandas as pd
cars = pd.read_csv('data.csv')
print(cars.to_string())
Result
Car Model Volume Weight CO2
0 Toyoty Aygo 1000 790 99
1 Mitsubishi Space Star 1200 1160 95
2 Skoda Citigo 1000 929 95
3 Fiat 500 900 865 90
4 Mini Cooper 1500 1140 105
5 VW Up! 1000 929 105
6 Skoda Fabia 1400 1109 90
7 Mercedes A-Class 1500 1365 92
8 Ford Fiesta 1500 1112 98
9 Audi A1 1600 1150 99
10 Hyundai I20 1100 980 99
11 Suzuki Swift 1300 990 101
12 Ford Fiesta 1000 1112 99
13 Honda Civic 1600 1252 94
14 Hundai I30 1600 1326 97
15 Opel Astra 1600 1330 97
16 BMW 1 1600 1365 99
17 Mazda 3 2200 1280 104
18 Skoda Rapid 1600 1119 104
19 Ford Focus 2000 1328 105
20 Ford Mondeo 1600 1584 94
21 Opel Insignia 2000 1428 99
22 Mercedes C-Class 2100 1365 99
23 Skoda Octavia 1600 1415 99
24 Volvo S60 2000 1415 99
25 Mercedes CLA 1500 1465 102
26 Audi A4 2000 1490 104
27 Audi A6 2000 1725 114
28 Volvo V70 1600 1523 109
29 BMW 5 2000 1705 114
30 Mercedes E-Class 2100 1605 115
31 Volvo XC70 2000 1746 117
32 Ford B-Max 1600 1235 104
33 BMW 216 1600 1390 108
34 Opel Zafira 1600 1405 109
35 Mercedes SLK 2500 1395 120
Run example »
In the multiple regression chapter, we tried to predict the CO2 emitted based on the
volume of the engine and the weight of the car but we excluded information about the
car brand and model.
The information about the car brand or the car model might help us make a better
prediction of the CO2 emitted.
To fix this issue, we must have a numeric representation of the categorical variable. One
way to do this is to have a column representing each group in the category.
For each column, the values will be 1 or 0 where 1 represents the inclusion of the group
and 0 represents the exclusion. This transformation is called one hot encoding.
You do not have to do this manually, the Python Pandas module has a function that
called get_dummies() which does one hot encoding.
Example
One Hot Encode the Car column:
import pandas as pd
cars = pd.read_csv('data.csv')
ohe_cars = pd.get_dummies(cars[['Car']])
print(ohe_cars.to_string())
Result
Car_Audi Car_BMW Car_Fiat Car_Ford Car_Honda Car_Hundai
Car_Hyundai Car_Mazda Car_Mercedes Car_Mini Car_Mitsubishi Car_Opel
Car_Skoda Car_Suzuki Car_Toyoty Car_VW Car_Volvo
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 1 0 0
1 0 0 0 0 0 0
0 0 0 0 1 0 0
0 0 0 0
2 0 0 0 0 0 0
0 0 0 0 0 0 1
0 0 0 0
3 0 0 1 0 0 0
0 0 0 0 0 0 0
0 0 0 0
4 0 0 0 0 0 0
0 0 0 1 0 0 0
0 0 0 0
5 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 1 0
6 0 0 0 0 0 0
0 0 0 0 0 0 1
0 0 0 0
7 0 0 0 0 0 0
0 0 1 0 0 0 0
0 0 0 0
8 0 0 0 1 0 0
0 0 0 0 0 0 0
0 0 0 0
9 1 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0
10 0 0 0 0 0 0
1 0 0 0 0 0 0
0 0 0 0
11 0 0 0 0 0 0
0 0 0 0 0 0 0
1 0 0 0
12 0 0 0 1 0 0
0 0 0 0 0 0 0
0 0 0 0
13 0 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 0
14 0 0 0 0 0 1
0 0 0 0 0 0 0
0 0 0 0
15 0 0 0 0 0 0
0 0 0 0 0 1 0
0 0 0 0
16 0 1 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0
17 0 0 0 0 0 0
0 1 0 0 0 0 0
0 0 0 0
18 0 0 0 0 0 0
0 0 0 0 0 0 1
0 0 0 0
19 0 0 0 1 0 0
0 0 0 0 0 0 0
0 0 0 0
20 0 0 0 1 0 0
0 0 0 0 0 0 0
0 0 0 0
21 0 0 0 0 0 0
0 0 0 0 0 1 0
0 0 0 0
22 0 0 0 0 0 0
0 0 1 0 0 0 0
0 0 0 0
23 0 0 0 0 0 0
0 0 0 0 0 0 1
0 0 0 0
24 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 1
25 0 0 0 0 0 0
0 0 1 0 0 0 0
0 0 0 0
26 1 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0
27 1 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0
28 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 1
29 0 1 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0
30 0 0 0 0 0 0
0 0 1 0 0 0 0
0 0 0 0
31 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 1
32 0 0 0 1 0 0
0 0 0 0 0 0 0
0 0 0 0
33 0 1 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0
34 0 0 0 0 0 0
0 0 0 0 0 1 0
0 0 0 0
35 0 0 0 0 0 0
0 0 1 0 0 0 0
0 0 0 0
Run example »
Results
A column was created for every car brand in the Car column.
Predict CO2
We can use this additional information alongside the volume and weight to predict CO2
To combine the information, we can use the concat() function from pandas.
import pandas
The pandas module allows us to read csv files and manipulate DataFrame objects:
cars = pandas.read_csv("data.csv")
ohe_cars = pandas.get_dummies(cars[['Car']])
Then we must select the independent variables (X) and add the dummy variables
columnwise.
regr = linear_model.LinearRegression()
regr.fit(X,y)
Finally we can predict the CO2 emissions based on the car's weight, volume, and
manufacturer.
##predict the CO2 emission of a Volvo where the weight is 2300kg, and the
volume is 1300cm3:
predictedCO2 =
regr.predict([[2300, 1300,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0]])
Example
import pandas
from sklearn import linear_model
cars = pandas.read_csv("data.csv")
ohe_cars = pandas.get_dummies(cars[['Car']])
regr = linear_model.LinearRegression()
regr.fit(X,y)
##predict the CO2 emission of a Volvo where the weight is 2300kg, and the volume
is 1300cm3:
predictedCO2 = regr.predict([[2300, 1300,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0]])
print(predictedCO2)
Result
[122.45153299]
Run example »
We now have a coefficient for the volume, the weight, and each car brand in the data set
Dummifying
It is not necessary to create one column for each group in your category. The information
can be retained using 1 column less than the number of groups you have.
For example, you have a column representing colors and in that column, you have two
colors, red and blue.
Example
import pandas as pd
print(colors)
Result
color
0 blue
1 red
Run example »
You can create 1 column called red where 1 represents red and 0 represents not red,
which means it is blue.
To do this, we can use the same function that we used for one hot encoding,
get_dummies, and then drop one of the columns. There is an argument, drop_first, which
allows us to exclude the first column from the resulting table.
Example
import pandas as pd
print(dummies)
Result
color_red
0 0
1 1
Run example »
What if you have more than 2 groups? How can the multiple groups be represented by 1
less column?
Let's say we have three colors this time, red, blue and green. When we get_dummies
while dropping the first column, we get the following table.
Example
import pandas as pd
print(dummies)
Result
color_green color_red color
0 0 0 blue
1 0 1 red
2 1 0 green
Run example »
❮ PreviousNext ❯
On this page, W3schools.com collaborates with NYC Data Science Academy, to deliver
digital training content to our students.
K-means
K-means is an unsupervised learning method for clustering data points. The algorithm
iteratively divides data points into K clusters by minimizing the variance in each cluster.
Here, we will show you how to estimate the best value for K using the elbow method,
then use K-means clustering to group the data points into clusters.
K-means clustering requires us to select K, the number of clusters we want to group the
data into. The elbow method lets us graph the inertia (a distance-based metric) and
visualize the point at which it starts decreasing linearly. This point is referred to as the
"eblow" and is a good estimate for the best value for K based on our data.
Example
Start by visualizing some data points:
plt.scatter(x, y)
plt.show()
Result
Run example »
Now we utilize the elbow method to visualize the intertia for different values of K:
Example
from sklearn.cluster import KMeans
for i in range(1,11):
kmeans = KMeans(n_clusters=i)
kmeans.fit(data)
inertias.append(kmeans.inertia_)
Result
Run example »
The elbow method shows that 2 is a good value for K, so we retrain and visualize the
result:
Example
kmeans = KMeans(n_clusters=2)
kmeans.fit(data)
plt.scatter(x, y, c=kmeans.labels_)
plt.show()
Result
Run example »
Example Explained
Import the modules you need.
You can learn about the Matplotlib module in our "Matplotlib Tutorial.
Create arrays that resemble two variables in a dataset. Note that while we only use two
variables here, this method will work with any number of variables:
Result:
[(4, 21), (5, 19), (10, 24), (4, 17), (3, 16), (11, 25), (14, 24), (6, 22),
(10, 21), (12, 21)]
In order to find the best value for K, we need to run K-means across our data for a range
of possible values. We only have 10 data points, so the maximum number of clusters is
10. So for each value K in range(1,11), we train a K-means model and plot the intertia at
that number of clusters:
inertias = []
for i in range(1,11):
kmeans = KMeans(n_clusters=i)
kmeans.fit(data)
inertias.append(kmeans.inertia_)
Result:
We can see that the "elbow" on the graph above (where the interia becomes more linear)
is at K=2. We can then fit our K-means algorithm one more time and plot the different
clusters assigned to the data:
kmeans = KMeans(n_clusters=2)
kmeans.fit(data)
plt.scatter(x, y, c=kmeans.labels_)
plt.show()
Result:
❮ PreviousNext ❯
Machine Learning - Bootstrap
Aggregation (Bagging)
❮ PreviousNext ❯
On this page, W3schools.com collaborates with NYC Data Science Academy, to deliver
digital training content to our students.
Bagging
Methods such as Decision Trees, can be prone to overfitting on the training set which can
lead to wrong predictions on new data.
We will be looking to identify different classes of wines found in Sklearn's wine dataset.
Next we need to load in the data and store it into X (input features) and y (target). The
parameter as_frame is set equal to True so we do not lose the feature names when
loading the data. (sklearn version older than 0.23 must skip the as_frame argument as it is
not supported)
data = datasets.load_wine(as_frame = True)
X = data.data
y = data.target
In order to properly evaluate our model on unseen data, we need to split X and y into
train and test sets. For information on splitting data, see the Train/Test lesson.
With our data prepared, we can now instantiate a base classifier and fit it to the training
data.
Result:
DecisionTreeClassifier(random_state=22)
We can now predict the class of wine the unseen test set and evaluate the model
performance.
y_pred = dtree.predict(X_test)
Result:
Example
Import the necessary data and evaluate base classifier performance.
X = data.data
y = data.target
The base classifier performs reasonably well on the dataset achieving 82% accuracy on
the test dataset with the current parameters (Different results may occur if you do not
have the random_state parameter set).
Now that we have a baseline accuracy for the test dataset, we can see how the Bagging
Classifier out performs a single Decision Tree Classifier.
For this sample dataset the number of estimators is relatively low, it is often the case
that much larger ranges are explored. Hyperparameter tuning is usually done with a grid
search, but for now we will use a select set of values for the number of estimators.
Now lets create a range of values that represent the number of estimators we want to
use in each ensemble.
estimator_range = [2,4,6,8,10,12,14,16]
To see how the Bagging Classifier performs with differing values of n_estimators we need
a way to iterate over the range of values and store the results from each ensemble. To do
this we will create a for loop, storing the models and scores in separate lists for later
vizualizations.
models = []
scores = []
With the models and scores stored, we can now visualize the improvement in model
performance.
# Visualize plot
plt.show()
Example
Import the necessary data and evaluate the BaggingClassifier performance.
X = data.data
y = data.target
estimator_range = [2,4,6,8,10,12,14,16]
models = []
scores = []
for n_estimators in estimator_range:
# Visualize plot
plt.show()
Result
Run example »
Results Explained
By iterating through different values for the number of estimators we can see an increase
in model performance from 82.2% to 95.5%. After 14 estimators the accuracy begins to
drop, again if you set a different random_state the values you see will vary. That is why it is
best practice to use cross validation to ensure stable results.
In this case, we see a 13.3% increase in accuracy when it comes to identifying the type
of the wine.
We saw in the last exercise that 12 estimators yielded the highest accuracy, so we will
use that to create our model. This time setting the parameter oob_score to true to evaluate
the model with out-of-bag score.
Example
Create a model with out-of-bag metric.
X = data.data
y = data.target
oob_model.fit(X_train, y_train)
print(oob_model.oob_score_)
Run example »
Since the samples used in OOB and the test set are different, and the dataset is relatively
small, there is a difference in the accuracy. It is rare that they would be exactly the
same, again OOB should be used quick means for estimating error, but is not the only
evaluation metric.
Note: This is only functional with smaller datasets, where the trees are relatively shallow
and narrow making them easy to visualize.
We will need to import plot_tree function from sklearn.tree. The different trees can be
graphed by changing the estimator you wish to visualize.
Example
Generate Decision Trees from Bagging Classifier
X = data.data
y = data.target
clf.fit(X_train, y_train)
plt.figure(figsize=(30, 20))
Result
Run example »
Here we can see just the first decision tree that was used to vote on the final prediction.
Again, by changing the index of the classifier you can see each of the trees that have
been aggregated.
❮ PreviousNext ❯
On this page, W3schools.com collaborates with NYC Data Science Academy, to deliver
digital training content to our students.
Cross Validation
When adjusting models we are aiming to increase overall model performance on unseen
data. Hyperparameter tuning can lead to much better performance on test sets. However,
optimizing parameters to the test set can lead information leakage causing the model to
preform worse on unseen data. To correct for this we can perform cross validation.
To better understand CV, we will be performing different methods on the iris dataset. Let
us first load in and separate the data.
X, y = datasets.load_iris(return_X_y=True)
There are many methods to cross validation, we will start by looking at k-fold cross
validation.
K-Fold
The training data used in the model is split, into k number of smaller sets, to be used to
validate the model. The model is then trained on k-1 folds of training set. The remaining
fold is then used as a validation set to evaluate the model.
As we will be trying to classify different species of iris flowers we will need to import a
classifier model, for this exercise we will be using a DecisionTreeClassifier. We will also
need to import CV modules from sklearn.
With the data loaded we can now create and fit a model for evaluation.
clf = DecisionTreeClassifier(random_state=42)
Now let's evaluate our model and see how it performs on each k-fold.
k_folds = KFold(n_splits = 5)
It is also good pratice to see how CV performed overall by averaging the scores for all
folds.
Example
Run k-fold CV:
X, y = datasets.load_iris(return_X_y=True)
clf = DecisionTreeClassifier(random_state=42)
k_folds = KFold(n_splits = 5)
Example
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import StratifiedKFold, cross_val_score
X, y = datasets.load_iris(return_X_y=True)
clf = DecisionTreeClassifier(random_state=42)
sk_folds = StratifiedKFold(n_splits = 5)
While the number of folds is the same, the average CV increases from the basic k-fold
when making sure there is stratified classes.
Leave-One-Out (LOO)
Instead of selecting the number of splits in the training data set like k-fold LeaveOneOut,
utilize 1 observation to validate and n-1 observations to train. This method is an
exaustive technique.
Example
Run LOO CV:
X, y = datasets.load_iris(return_X_y=True)
clf = DecisionTreeClassifier(random_state=42)
loo = LeaveOneOut()
We can observe that the number of cross validation scores performed is equal to the
number of observations in the dataset. In this case there are 150 observations in the iris
dataset.
Leave-P-Out (LPO)
Leave-P-Out is simply a nuanced diffence to the Leave-One-Out idea, in that we can
select the number of p to use in our validation set.
Example
Run LPO CV:
X, y = datasets.load_iris(return_X_y=True)
clf = DecisionTreeClassifier(random_state=42)
lpo = LeavePOut(p=2)
As we can see this is an exhaustive method we many more scores being calculated than
Leave-One-Out, even with a p = 2, yet it achieves roughly the same average CV score.
Shuffle Split
Unlike KFold, ShuffleSplit leaves out a percentage of the data, not to be used in the train
or validation sets. To do so we must decide what the train and test sizes are, as well as
the number of splits.
Example
Run Shuffle Split CV:
X, y = datasets.load_iris(return_X_y=True)
clf = DecisionTreeClassifier(random_state=42)
Ending Notes
These are just a few of the CV methods that can be applied to models. There are many
more cross validation classes, with most models having their own class. Check out
sklearns cross validation for more CV options.
❮ PreviousNext ❯
On this page, W3schools.com collaborates with NYC Data Science Academy, to deliver
digital training content to our students.
Another common metric is AUC, area under the receiver operating characteristic (ROC)
curve. The Reciever operating characteristic curve plots the true positive (TP) rate versus
the false positive (FP) rate at different classification thresholds. The thresholds are
different probability cutoffs that separate the two classes in binary classification. It uses
probability to tell us how well a model separates the classes.
Imbalanced Data
Suppose we have an imbalanced data set where the majority of our data is of one value.
We can obtain high accuracy for the model by predicting the majority class.
Example
import numpy as np
from sklearn.metrics import accuracy_score, confusion_matrix, roc_auc_score,
roc_curve
n = 10000
ratio = .95
n_0 = int((1-ratio) * n)
n_1 = int(ratio * n)
y = np.array([0] * n_0 + [1] * n_1)
# below are the probabilities obtained from a hypothetical model that always
predicts the majority class
# probability of predicting class 1 is going to be 100%
y_proba = np.array([1]*n)
y_pred = y_proba > .5
Although we obtain a very high accuracy, the model provided no information about the
data so it's not useful. We accurately predict class 1 100% of the time while inaccurately
predict class 0 0% of the time. At the expense of accuracy, it might be better to have a
model that can somewhat separate the two classes.
Example
# below are the probabilities obtained from a hypothetical model that doesn't
always predict the mode
y_proba_2 = np.array(
np.random.uniform(0, .7, n_0).tolist() +
np.random.uniform(.3, 1, n_1).tolist()
)
y_pred_2 = y_proba_2 > .5
For the second set of predictions, we do not have as high of an accuracy score as the first
but the accuracy for each class is more balanced. Using accuracy as an evaluation metric
we would rate the first model higher than the second even though it doesn't tell us
anything about the data.
In cases like this, using another evaluation metric like AUC would be preferred.
Example
Model 1:
plot_roc_curve(y, y_proba)
print(f'model 1 AUC score: {roc_auc_score(y, y_proba)}')
Result
model 1 AUC score: 0.5
Run example »
Example
Model 2:
plot_roc_curve(y, y_proba_2)
print(f'model 2 AUC score: {roc_auc_score(y, y_proba_2)}')
Result
model 2 AUC score: 0.8270551578947367
Run example »
An AUC score of around .5 would mean that the model is unable to make a distinction
between the two classes and the curve would look like a line with a slope of 1. An AUC
score closer to 1 means that the model has the ability to separate the two classes and
the curve would come closer to the top left corner of the graph.
Probabilities
Because AUC is a metric that utilizes probabilities of the class predictions, we can be
more confident in a model that has a higher AUC score than one with a lower score even
if they have similar accuracies.
In the data below, we have two sets of probabilites from hypothetical models. The first
has probabilities that are not as "confident" when predicting the two classes (the
probabilities are close to .5). The second has probabilities that are more "confident" when
predicting the two classes (the probabilities are close to the extremes of 0 or 1).
Example
import numpy as np
n = 10000
y = np.array([0] * n + [1] * n)
#
y_prob_1 = np.array(
np.random.uniform(.25, .5, n//2).tolist() +
np.random.uniform(.3, .7, n).tolist() +
np.random.uniform(.5, .75, n//2).tolist()
)
y_prob_2 = np.array(
np.random.uniform(0, .4, n//2).tolist() +
np.random.uniform(.3, .7, n).tolist() +
np.random.uniform(.6, 1, n//2).tolist()
)
Example
Plot model 1:
plot_roc_curve(y, y_prob_1)
Result
Run example »
Example
Plot model 2:
Result
Run example »
Even though the accuracies for the two models are similar, the model with the higher
AUC score will be more reliable because it takes into account the predicted probability. It
is more likely to give you higher accuracy when predicting future data.
❮ PreviousNext ❯
Machine Learning - K-nearest
neighbors (KNN)
❮ PreviousNext ❯
On this page, W3schools.com collaborates with NYC Data Science Academy, to deliver
digital training content to our students.
KNN
KNN is a simple, supervised machine learning (ML) algorithm that can be used for
classification or regression tasks - and is also frequently used in missing value
imputation. It is based on the idea that the observations closest to a given data point are
the most "similar" observations in a data set, and we can therefore classify unforeseen
points based on the values of the closest existing points. By choosing K, the user can
select the number of nearby observations to use in the algorithm.
Here, we will show you how to implement the KNN algorithm for classification, and show
how different values of K affect the results.
Example
Start by visualizing some data points:
plt.scatter(x, y, c=classes)
plt.show()
Result
Run example »
Now we fit the KNN algorithm with K=1:
knn.fit(data, classes)
Example
new_x = 8
new_y = 21
new_point = [(new_x, new_y)]
prediction = knn.predict(new_point)
Result
Run example »
Now we do the same thing, but with a higher K value which changes the prediction:
Example
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(data, classes)
prediction = knn.predict(new_point)
Result
Run example »
Example Explained
Import the modules you need.
You can learn about the Matplotlib module in our "Matplotlib Tutorial.
Create arrays that resemble variables in a dataset. We have two input features (x and y)
and then a target class (class). The input features that are pre-labeled with our target
class will be used to predict the class of new data. Note that while we only use two input
features here, this method will work with any number of variables:
Using the input features and target class, we fit a KNN model on the model using 1
nearest neighbor:
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(data, classes)
Then, we can use the same KNN object to predict the class of new, unforeseen data
points. First we create new x and y features, and then call knn.predict() on the new
data point to get a class of 0 or 1:
new_x = 8
new_y = 21
new_point = [(new_x, new_y)]
prediction = knn.predict(new_point)
print(prediction)
Result:
[0]
When we plot all the data along with the new point and class, we can see it's been
labeled blue with the 1 class. The text annotation is just to highlight the location of the
new point:
Result:
However, when we changes the number of neighbors to 5, the number of points used to
classify our new point changes. As a result, so does the classification of the new point:
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(data, classes)
prediction = knn.predict(new_point)
print(prediction)
Result:
[1]
When we plot the class of the new point along with the older points, we note that the
color has changed based on the associated class label:
Result:
❮ PreviousNext ❯
Python MySQL
❮ PreviousNext ❯
MySQL Database
To be able to experiment with the code examples in this tutorial, you should have MySQL
installed on your computer.
Navigate your command line to the location of PIP, and type the following:
import mysql.connector
Run example »
If the above code was executed with no errors, "MySQL Connector" is installed and ready
to be used.
Create Connection
Start by creating a connection to the database.
demo_mysql_connection.py:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword"
)
print(mydb)
Run example »
Now you can start querying the database using SQL statements.
❮ PreviousNext ❯
Creating a Database
To create a database in MySQL, use the "CREATE DATABASE" statement:
Example
create a database named "mydatabase":
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword"
)
mycursor = mydb.cursor()
Run example »
If the above code was executed with no errors, you have successfully created a database.
Example
Return a list of your system's databases:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword"
)
mycursor = mydb.cursor()
mycursor.execute("SHOW DATABASES")
for x in mycursor:
print(x)
Run example »
Or you can try to access the database when making the connection:
Example
Try connecting to the database "mydatabase":
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
Run example »
❮ PreviousNext ❯
Creating a Table
To create a table in MySQL, use the "CREATE TABLE" statement.
Make sure you define the name of the database when you create the connection
Example
Create a table named "customers":
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
Run example »
If the above code was executed with no errors, you have now successfully created a
table.
Example
Return a list of your system's databases:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute("SHOW TABLES")
for x in mycursor:
print(x)
Run example »
Primary Key
When creating a table, you should also create a column with a unique key for each
record.
We use the statement "INT AUTO_INCREMENT PRIMARY KEY" which will insert a unique
number for each record. Starting at 1, and increased by one for each record.
Example
Create primary key when creating the table:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
Example
Create primary key on an existing table:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
Example
Insert a record in the "customers" table:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mydb.commit()
Run example »
The second parameter of the executemany() method is a list of tuples, containing the
data you want to insert:
Example
Fill the "customers" table with data:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.executemany(sql, val)
mydb.commit()
Get Inserted ID
You can get the id of the row you just inserted by asking the cursor object.
Note: If you insert more than one row, the id of the last inserted row is returned.
Example
Insert one row, and return the ID:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mydb.commit()
❮ PreviousNext ❯
Example
Select all records from the "customers" table, and display the result:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Run example »
Note: We use the fetchall() method, which fetches all rows from the last executed
statement.
Selecting Columns
To select only some of the columns in a table, use the "SELECT" statement followed by
the column name(s):
Example
Select only the name and address columns:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Run example »
The fetchone() method will return the first row of the result:
Example
Fetch only one row:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
print(myresult)
Run example »
❮ PreviousNext ❯
Example
Select record(s) where the address is "Park Lane 38": result:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql)
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Run example »
Wildcard Characters
You can also select the records that starts, includes, or ends with a given letter or phrase.
Example
Select records where the address contains the word "way":
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql)
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Run example »
This is to prevent SQL injections, which is a common web hacking technique to destroy or
misuse your database.
Example
Escape query values by using the placholder %s method:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql, adr)
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Run example »
❮ PreviousNext ❯
The ORDER BY keyword sorts the result ascending by default. To sort the result in
descending order, use the DESC keyword.
Example
Sort the result alphabetically by name: result:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql)
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Run example »
ORDER BY DESC
Use the DESC keyword to sort the result in a descending order.
Example
Sort the result reverse alphabetically by name:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql)
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Run example »
❮ PreviousNext ❯
Python MySQL Delete From By
❮ PreviousNext ❯
Delete Record
You can delete records from an existing table by using the "DELETE FROM" statement:
Example
Delete any record where the address is "Mountain 21":
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql)
mydb.commit()
Run example »
Notice the WHERE clause in the DELETE syntax: The WHERE clause specifies which
record(s) that should be deleted. If you omit the WHERE clause, all records will be
deleted!
This is to prevent SQL injections, which is a common web hacking technique to destroy or
misuse your database.
The mysql.connector module uses the placeholder %s to escape values in the delete
statement:
Example
Escape values by using the placeholder %s method:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql, adr)
mydb.commit()
❮ PreviousNext ❯
Delete a Table
You can delete an existing table by using the "DROP TABLE" statement:
Example
Delete the table "customers":
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql)
Run example »
Example
Delete the table "customers" if it exists:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql)
Run example »
❮ PreviousNext ❯
Python MySQL Update Table
❮ PreviousNext ❯
Update Table
You can update existing records in a table by using the "UPDATE" statement:
Example
Overwrite the address column from "Valley 345" to "Canyon 123":
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
sql = "UPDATE customers SET address = 'Canyon 123' WHERE address = 'Valley 345'"
mycursor.execute(sql)
mydb.commit()
Important!: Notice the statement: mydb.commit(). It is required to make the changes, otherwise no changes
are made to the table.
Notice the WHERE clause in the UPDATE syntax: The WHERE clause specifies which record or records that
should be updated. If you omit the WHERE clause, all records will be updated!
The mysql.connector module uses the placeholder %s to escape values in the delete statement:
Example
Escape values by using the placeholder %s method:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql, val)
mydb.commit()
❮ PreviousNext ❯
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Run example »
Example
Start from position 3, and return 5 records:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Run example »
❮ PreviousNext ❯
users
{ id: 1, name: 'John', fav: 154},
{ id: 2, name: 'Peter', fav: 154},
{ id: 3, name: 'Amy', fav: 155},
{ id: 4, name: 'Hannah', fav:},
{ id: 5, name: 'Michael', fav:}
products
{ id: 154, name: 'Chocolate Heaven' },
{ id: 155, name: 'Tasty Lemons' },
{ id: 156, name: 'Vanilla Dreams' }
These two tables can be combined by using users' fav field and products' id field.
Example
Join users and products to see the name of the users favorite product:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
sql = "SELECT \
users.name AS user, \
products.name AS favorite \
FROM users \
INNER JOIN products ON users.fav = products.id"
mycursor.execute(sql)
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Run example »
Note: You can use JOIN instead of INNER JOIN. They will both give you the same result.
LEFT JOIN
In the example above, Hannah, and Michael were excluded from the result, that is
because INNER JOIN only shows the records where there is a match.
If you want to show all users, even if they do not have a favorite product, use the LEFT
JOIN statement:
Example
Select all users and their favorite product:
sql = "SELECT \
users.name AS user, \
products.name AS favorite \
FROM users \
LEFT JOIN products ON users.fav = products.id"
Run example »
RIGHT JOIN
If you want to return all products, and the users who have them as their favorite, even if
no user have them as their favorite, use the RIGHT JOIN statement:
Example
Select all products, and the user(s) who have them as their favorite:
sql = "SELECT \
users.name AS user, \
products.name AS favorite \
FROM users \
RIGHT JOIN products ON users.fav = products.id"
Run example »
Note: Hannah and Michael, who have no favorite product, are not included in the result.
❮ PreviousNext ❯
Python MongoDB
❮ PreviousNext ❯
MongoDB
MongoDB stores data in JSON-like documents, which makes the database very flexible
and scalable.
To be able to experiment with the code examples in this tutorial, you will need access to
a MongoDB database.
PyMongo
Python needs a MongoDB driver to access the MongoDB database.
Navigate your command line to the location of PIP, and type the following:
demo_mongodb_test.py:
import pymongo
Run example »
If the above code was executed with no errors, "pymongo" is installed and ready to be
used.
❮ PreviousNext ❯
Creating a Database
To create a database in MongoDB, start by creating a MongoClient object, then specify a
connection URL with the correct ip address and the name of the database you want to
create.
MongoDB will create the database if it does not exist, and make a connection to it.
Example
Create a database called "mydatabase":
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
Run example »
You can check if a database exist by listing all databases in you system:
Example
Return a list of your system's databases:
print(myclient.list_database_names())
Run example »
Example
Check if "mydatabase" exists:
dblist = myclient.list_database_names()
if "mydatabase" in dblist:
print("The database exists.")
Run example »
❮ PreviousNext ❯
Example
Create a collection called "customers":
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
Run example »
MongoDB waits until you have inserted a document before it actually creates the
collection.
Example
Return a list of all collections in your database:
print(mydb.list_collection_names())
Run example »
Example
Check if the "customers" collection exists:
collist = mydb.list_collection_names()
if "customers" in collist:
print("The collection exists.")
Run example »
❮ PreviousNext ❯
The first parameter of the insert_one() method is a dictionary containing the name(s) and
value(s) of each field in the document you want to insert.
Example
Insert a record in the "customers" collection:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
x = mycol.insert_one(mydict)
Run example »
x = mycol.insert_one(mydict)
print(x.inserted_id)
Run example »
If you do not specify an _id field, then MongoDB will add one for you and assign a unique
id for each document.
In the example above no _id field was specified, so MongoDB assigned a unique _id for
the record (document).
The first parameter of the insert_many() method is a list containing dictionaries with the
data you want to insert:
Example
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mylist = [
{ "name": "Amy", "address": "Apple st 652"},
{ "name": "Hannah", "address": "Mountain 21"},
{ "name": "Michael", "address": "Valley 345"},
{ "name": "Sandy", "address": "Ocean blvd 2"},
{ "name": "Betty", "address": "Green Grass 1"},
{ "name": "Richard", "address": "Sky st 331"},
{ "name": "Susan", "address": "One way 98"},
{ "name": "Vicky", "address": "Yellow Garden 2"},
{ "name": "Ben", "address": "Park Lane 38"},
{ "name": "William", "address": "Central st 954"},
{ "name": "Chuck", "address": "Main Road 989"},
{ "name": "Viola", "address": "Sideway 1633"}
]
x = mycol.insert_many(mylist)
Remember that the values has to be unique. Two documents cannot have the same _id.
Example
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mylist = [
{ "_id": 1, "name": "John", "address": "Highway 37"},
{ "_id": 2, "name": "Peter", "address": "Lowstreet 27"},
{ "_id": 3, "name": "Amy", "address": "Apple st 652"},
{ "_id": 4, "name": "Hannah", "address": "Mountain 21"},
{ "_id": 5, "name": "Michael", "address": "Valley 345"},
{ "_id": 6, "name": "Sandy", "address": "Ocean blvd 2"},
{ "_id": 7, "name": "Betty", "address": "Green Grass 1"},
{ "_id": 8, "name": "Richard", "address": "Sky st 331"},
{ "_id": 9, "name": "Susan", "address": "One way 98"},
{ "_id": 10, "name": "Vicky", "address": "Yellow Garden 2"},
{ "_id": 11, "name": "Ben", "address": "Park Lane 38"},
{ "_id": 12, "name": "William", "address": "Central st 954"},
{ "_id": 13, "name": "Chuck", "address": "Main Road 989"},
{ "_id": 14, "name": "Viola", "address": "Sideway 1633"}
]
x = mycol.insert_many(mylist)
In MongoDB we use the find and findOne methods to find data in a collection.
Just like the SELECT statement is used to find data in a table in a MySQL database.
Find One
To select data from a collection in MongoDB, we can use the find_one() method.
Example
Find the first document in the customers collection:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
x = mycol.find_one()
print(x)
Run example »
Find All
To select data from a table in MongoDB, we can also use the find() method.
The first parameter of the find() method is a query object. In this example we use an
empty query object, which selects all documents in the collection.
No parameters in the find() method gives you the same result as SELECT * in MySQL.
Example
Return all documents in the "customers" collection, and print each document:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
for x in mycol.find():
print(x)
Run example »
This parameter is optional, and if omitted, all fields will be included in the result.
Example
Return only the names and addresses, not the _ids:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
You are not allowed to specify both 0 and 1 values in the same object (except if one of
the fields is the _id field). If you specify a field with the value 0, all other fields get the
value 1, and vice versa:
Example
This example will exclude "address" from the result:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
for x in mycol.find({},{ "address": 0 }):
print(x)
Run example »
Example
You get an error if you specify both 0 and 1 values in the same object (except if one of
the fields is the _id field):
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
❮ PreviousNext ❯
The first argument of the find() method is a query object, and is used to limit the search.
Example
Find document(s) with the address "Park Lane 38":
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mydoc = mycol.find(myquery)
for x in mydoc:
print(x)
Run example »
Advanced Query
To make advanced queries you can use modifiers as values in the query object.
E.g. to find the documents where the "address" field starts with the letter "S" or higher
(alphabetically), use the greater than modifier: {"$gt": "S"}:
Example
Find documents where the address starts with the letter "S" or higher:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mydoc = mycol.find(myquery)
for x in mydoc:
print(x)
Run example »
To find only the documents where the "address" field starts with the letter "S", use the
regular expression {"$regex": "^S"}:
Example
Find documents where the address starts with the letter "S":
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mydoc = mycol.find(myquery)
for x in mydoc:
print(x)
Run example »
❮ PreviousNext ❯
The sort() method takes one parameter for "fieldname" and one parameter for "direction"
(ascending is the default direction).
Example
Sort the result alphabetically by name:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mydoc = mycol.find().sort("name")
for x in mydoc:
print(x)
Run example »
Sort Descending
Use the value -1 as the second parameter to sort descending.
sort("name", 1) #ascending
sort("name", -1) #descending
Example
Sort the result reverse alphabetically by name:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
for x in mydoc:
print(x)
Run example »
❮ PreviousNext ❯
Delete Document
To delete one document, we use the delete_one() method.
The first parameter of the delete_one() method is a query object defining which document
to delete.
Note: If the query finds more than one document, only the first occurrence is deleted.
Example
Delete the document with the address "Mountain 21":
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mycol.delete_one(myquery)
Run example »
The first parameter of the delete_many() method is a query object defining which
documents to delete.
Example
Delete all documents were the address starts with the letter S:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
x = mycol.delete_many(myquery)
Example
Delete all documents in the "customers" collection:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
x = mycol.delete_many({})
❮ PreviousNext ❯
Delete Collection
You can delete a table, or collection as it is called in MongoDB, by using
the drop() method.
Example
Delete the "customers" collection:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mycol.drop()
Run example »
The drop() method returns true if the collection was dropped successfully, and false if the
collection does not exist.
❮ PreviousNext ❯
Python MongoDB Update
❮ PreviousNext ❯
Update Collection
You can update a record, or document as it is called in MongoDB, by using the update_one() method.
The first parameter of the update_one() method is a query object defining which document to update.
Note: If the query finds more than one record, only the first occurrence is updated.
The second parameter is an object defining the new values of the document.
Example
Change the address from "Valley 345" to "Canyon 123":
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mycol.update_one(myquery, newvalues)
Update Many
To update all documents that meets the criteria of the query, use the update_many() method.
Example
Update all documents where the address starts with the letter "S":
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
x = mycol.update_many(myquery, newvalues)
❮ PreviousNext ❯
The limit() method takes one parameter, a number defining how many documents to
return.
Customers
{'_id': 1, 'name': 'John', 'address': 'Highway37'}
{'_id': 2, 'name': 'Peter', 'address': 'Lowstreet 27'}
{'_id': 3, 'name': 'Amy', 'address': 'Apple st 652'}
{'_id': 4, 'name': 'Hannah', 'address': 'Mountain 21'}
{'_id': 5, 'name': 'Michael', 'address': 'Valley 345'}
{'_id': 6, 'name': 'Sandy', 'address': 'Ocean blvd 2'}
{'_id': 7, 'name': 'Betty', 'address': 'Green Grass 1'}
{'_id': 8, 'name': 'Richard', 'address': 'Sky st 331'}
{'_id': 9, 'name': 'Susan', 'address': 'One way 98'}
{'_id': 10, 'name': 'Vicky', 'address': 'Yellow Garden 2'}
{'_id': 11, 'name': 'Ben', 'address': 'Park Lane 38'}
{'_id': 12, 'name': 'William', 'address': 'Central st 954'}
{'_id': 13, 'name': 'Chuck', 'address': 'Main Road 989'}
{'_id': 14, 'name': 'Viola', 'address': 'Sideway 1633'}
Example
Limit the result to only return 5 documents:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
myresult = mycol.find().limit(5)
❮ PreviousNext ❯
Python Reference
❮ PreviousNext ❯
Python Reference
Built-in Functions
String Methods
List Methods
Dictionary Methods
Tuple Methods
Set Methods
File Methods
Keywords
Exceptions
Glossary
Module Reference
Random Module
Requests Module
Math Module
CMath Module
❮ PreviousNext ❯
❮ PreviousNext ❯
Python has a set of built-in methods that you can use on strings.
Note: All string methods returns new values. They do not change the original string.
Method Description
capitalize() Converts the first character to upper case
Note: All string methods returns new values. They do not change the original string.
Python has a set of built-in methods that you can use on lists/arrays.
Method Description
extend() Add the elements of a list (or any iterable), to the end of
the current list
index() Returns the index of the first element with the specified
value
Note: Python does not have built-in support for Arrays, but Python Lists can be used
instead.
❮ PreviousNext ❯
Python has a set of built-in methods that you can use on dictionaries.
Method Description
setdefault() Returns the value of the specified key. If the key does
not exist: insert the key, with the specified value
❮ PreviousNext ❯
Python Tuple Methods
❮ PreviousNext ❯
Python has two built-in methods that you can use on tuples.
Method Description
index() Searches the tuple for a specified value and returns the
position of where it was found
❮ PreviousNext ❯
Python has a set of built-in methods that you can use on sets.
Method Description
intersection_update() Removes the items in this set that are not present
in other, specified set(s)
❮ PreviousNext ❯
Method Description
Learn more about the file object in our Python File Handling Tutorial.
❮ PreviousNext ❯
Python Keywords
❮ PreviousNext ❯
Python has a set of keywords that are reserved words that cannot be used as variable
names, function names, or any other identifiers:
Keyword Description
as To create an alias
or A logical operator
❮ PreviousNext ❯
Built-in Exceptions
The table below shows built-in exceptions that are usually raised in Python:
Exception Description
❮ PreviousNext ❯
Python Glossary
❮ PreviousNext ❯
Feature Description
❮ PreviousNext ❯
Python has a built-in module that you can use to make random numbers.
Method Description
seed() Initialize the random number generator
choices() Returns a list with a random selection from the given sequence
❮ PreviousNext ❯
Python Requests Module
❮ PreviousNext ❯
Example
Make a request to a web page, and print the response text:
import requests
x = requests.get('https://w3schools.com/python/demopage.htm')
print(x.text)
Run Example »
The HTTP request returns a Response Object with all the response data (content,
encoding, status, etc).
Syntax
requests.methodname(params)
Methods
Method Description
❮ PreviousNext ❯
Python statistics Module
❮ PreviousNext ❯
Statistics Methods
Method Description
❮ PreviousNext ❯
Math Methods
Method Description
math.expm1() Returns Ex - 1
Math Constants
Constant Description
❮ PreviousNext ❯
Python cmath Module
❮ PreviousNext ❯
The methods in this module accepts int, float, and complex numbers. It even accepts
Python objects that has a __complex__() or __float__() method.
The methods in this module almost always return a complex number. If the return value
can be expressed as a real number, the return value has an imaginary part of 0.
cMath Methods
Method Description
cMath Constants
Constant Description
❮ PreviousNext ❯
How to Remove Duplicates From a
Python List
❮ PreviousNext ❯
Example
Remove any duplicates from a List:
Example Explained
First we have a List that contains duplicates:
Create a dictionary, using the List items as keys. This will automatically remove any
duplicates because dictionaries cannot have duplicate keys.
Create a Dictionary
mylist = ["a", "b", "a", "c", "c"]
mylist = list( dict.fromkeys(mylist) )
print(mylist)
Now we have a List without any duplicates, and it has the same order as the original List.
Print the List to demonstrate the result
Create a Function
If you like to have a function where you can send your lists, and get them back without
duplicates, you can create a function and insert the code from the example above.
Example
def my_function(x):
return list(dict.fromkeys(x))
print(mylist)
Try it Yourself »
Example Explained
Create a function that takes a List as an argument.
Create a Function
def my_function(x):
return list(dict.fromkeys(x))
print(mylist)
Create a Dictionary
def my_function(x):
return list( dict.fromkeys(x) )
print(mylist)
Convert the dictionary into a list.
print(mylist)
Return List
def my_function(x):
return list(dict.fromkeys(x))
print(mylist)
print(mylist)
print(mylist)
❮ PreviousNext ❯
How to Reverse a String in Python
❮ PreviousNext ❯
The fastest (and easiest?) way is to use a slice that steps backwards, -1.
Example
Reverse the string "Hello World":
Example Explained
We have a string, "Hello World", which we want to reverse:
Create a slice that starts at the end of the string, and moves backwards.
In this particular example, the slice statement [::-1] means start at the end of the string
and end at position 0, move with the step -1, negative one, which means one step
backwards.
Create a Function
If you like to have a function where you can send your strings, and return them
backwards, you can create a function and insert the code from the example above.
Example
def my_function(x):
return x[::-1]
print(mytxt)
Try it Yourself »
Example Explained
Create a function that takes a String as an argument.
Create a Function
def my_function(x):
return x[::-1]
print(mytxt)
Slice the string starting at the end of the string and move backwards.
print(mytxt)
print(mytxt )
print(mytxt)
print(mytxt)
❮ PreviousNext ❯
Example
x = 5
y = 10
print(x + y)
Try it Yourself »
Example
x = input("Type a number: ")
y = input("Type another number: ")
❮ PreviousNext ❯
Python Examples
Goto this link
https://www.w3schools.com/python/python_examples.asp