0% found this document useful (0 votes)

67 views66 pages

DataScienceWithPython Ed2018

This document provides an overview of common string, list, tuple, and dictionary manipulation methods in Python. It describes various methods for formatting, validating, searching, splitting, joining, adding, removing, and sorting string values. For lists and tuples, it outlines aggregation, elimination, ordering, and research methods. It also discusses type conversion between lists and tuples, concatenation of collections, and maximum/minimum/count functions. For dictionaries, the document covers elimination, aggregation/creation, and return methods.

Uploaded by

ScribdTranslations

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views66 pages

DataScienceWithPython Ed2018

Uploaded by

ScribdTranslations

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 66

44444444444.

EUGENIA BAHIT

DATA SCIENCE
WITH PYTHON

STUDY MATERIAL

Information and registration:

Course: http://escuela.eugeniabahit.com | Certifications: http://python.laeci.org
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

SUMMARY
VARIABLE MANIPULATION METHODS...........................................................................................................5
STRING MANIPULATION.....................................................................................................................................5
FORMATTING METHODS.................................................................................................................................5
CAPITALIZE THE FIRST LETTER................................................................................................................5
CONVERT A STRING TO LOWERCASE.....................................................................................................5
CONVERT A STRING TO UPPERCASE.......................................................................................................6
CONVERT UPPERCASE TO LOWERCASE AND VICE VERSA...............................................................6
CONVERT A STRING TO TITLE FORMAT.................................................................................................6
CENTER A TEXT.............................................................................................................................................6
ALIGN TEXT TO THE LEFT..........................................................................................................................6
ALIGN TEXT TO THE RIGHT.......................................................................................................................7
FILL IN A TEXT BY PREFIXING IT WITH ZEROS....................................................................................7
RESEARCH METHODS......................................................................................................................................7
COUNT NUMBER OF OCCURRENCES OF A SUBSTRING......................................................................7
SEARCH FOR A SUBSTRING WITHIN A STRING.....................................................................................7
VALIDATION METHODS..................................................................................................................................8
TO KNOW IF A STRING BEGINS WITH A GIVEN SUBSTRING.............................................................8
TO KNOW IF A STRING ENDS WITH A GIVEN SUBSTRING.................................................................8
TO KNOW IF A STRING IS ALPHANUMERIC...........................................................................................8
TO KNOW IF A STRING IS ALPHABETIC..................................................................................................8
TO KNOW IF A STRING IS NUMERIC.........................................................................................................9
TO KNOW IF A STRING CONTAINS ONLY LOWERCASE LETTERS....................................................9
TO KNOW IF A STRING CONTAINS ONLY UPPERCASE LETTERS......................................................9
TO KNOW IF A STRING CONTAINS ONLY BLANKS............................................................................10
TO KNOW IF A STRING HAS A TITLE FORMAT....................................................................................10
SUBSTITUTION METHODS............................................................................................................................10
FORMATTING A STRING, DYNAMICALLY SUBSTITUTING TEXT...................................................10
REPLACE TEXT IN A STRING....................................................................................................................11
REMOVE CHARACTERS TO THE LEFT AND RIGHT OF A STRING...................................................11
REMOVE CHARACTERS TO THE LEFT OF A STRING..........................................................................11
REMOVE CHARACTERS TO THE RIGHT OF A STRING.......................................................................11
JOINING AND SPLITTING METHODS..........................................................................................................11
ITERATIVELY JOIN A CHAIN....................................................................................................................11

-2-

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

SPLITTING A STRING INTO THREE PARTS, USING A SEPARATOR..................................................12

SPLITTING A STRING INTO SEVERAL PARTS, USING A SEPARATOR............................................12
SPLIT A STRING INTO LINES.....................................................................................................................12
MANIPULATION OF LISTS AND TUPLES.......................................................................................................14
AGGREGATION METHODS............................................................................................................................14
ADD AN ITEM TO THE END OF THE LIST...............................................................................................14
ADD SEVERAL ITEMS TO THE END OF THE LIST................................................................................14
ADD AN ELEMENT IN A GIVEN POSITION.............................................................................................14
ELIMINATION METHODS..............................................................................................................................14
DELETE THE LAST ITEM IN THE LIST....................................................................................................14
DELETE AN ELEMENT BY ITS INDEX.....................................................................................................15
DELETE AN ITEM BY ITS VALUE.............................................................................................................15
ORDER METHODS...........................................................................................................................................15
SORT A LIST IN REVERSE (REVERSE ORDER)......................................................................................15
SORT A LIST IN ASCENDING ORDER......................................................................................................15
SORT A LIST IN DESCENDING ORDER....................................................................................................15
RESEARCH METHODS....................................................................................................................................15
COUNT NUMBER OF OCCURRENCES ELEMENTS...............................................................................15
GET INDEX NUMBER..................................................................................................................................16
ANNEX ON LISTS AND TUPLES...................................................................................................................16
TYPE CONVERSION.....................................................................................................................................16
CONCATENATION OF COLLECTIONS.....................................................................................................17
MAXIMUM AND MINIMUM VALUE........................................................................................................20
COUNT ITEMS...............................................................................................................................................20
DICTIONARY MANIPULATION........................................................................................................................22
ELIMINATION METHODS..............................................................................................................................22
EMPTY A DICTIONARY..............................................................................................................................22
AGGREGATION AND CREATION METHODS.............................................................................................22
COPY A DICTIONARY.................................................................................................................................22
CREATE A NEW DICTIONARY FROM THE KEYS OF AN EXISTING DICTIONARY.......................23
SEQUENCE.....................................................................................................................................................23
CONCATENATE DICTIONARIES...............................................................................................................23
SET A DEFAULT KEY AND VALUE..........................................................................................................23
RETURN METHODS.........................................................................................................................................24
GET THE VALUE OF A KEY.......................................................................................................................24
TO KNOW IF A KEY EXISTS IN THE DICTIONARY...............................................................................24
OBTAIN THE KEYS AND VALUES OF A DICTIONARY........................................................................24

-3-

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

OBTAIN THE KEYS TO A DICTIONARY..................................................................................................24

GET THE VALUES OF A DICTIONARY....................................................................................................25
OBTAIN THE NUMBER OF ITEMS IN A DICTIONARY.........................................................................25
FILE HANDLING AND MANIPULATION.........................................................................................................27
WAYS TO OPEN A FILE..................................................................................................................................27
SOME METHODS OF THE FILE OBJECT......................................................................................................29
CSV FILE HANDLING.........................................................................................................................................30
SOME EXAMPLES OF CSV FILES.................................................................................................................30
WORKING WITH CSV FILES FROM PYTHON............................................................................................32
READING CSV FILES...................................................................................................................................32
WRITING CSV FILES....................................................................................................................................37
PROBABILITY AND STATISTICS WITH PYTHON.........................................................................................40
PROBABILITY OF MUTUALLY EXCLUSIVE SIMPLE AND COMPOUND EVENTS IN PYTHON......40
SAMPLE SPACE............................................................................................................................................40
SIMPLE AND COMPOUND EVENTS.........................................................................................................40
PROBABILITY ASSIGNMENT....................................................................................................................41
SIMPLE MUTUALLY EXCLUSIVE EVENTS.........................................................................................41
EVENTS COMPOSED OF MUTUALLY EXCLUSIVE SIMPLE EVENTS...........................................42
FUNCTIONS...................................................................................................................................................43
CONDITIONAL PROBABILITY IN PYTHON................................................................................................43
FUNCTIONS...................................................................................................................................................44
DEPENDENT EVENTS..................................................................................................................................44
SET THEORY IN PYTHON.......................................................................................................................46
INDEPENDENT EVENTS.............................................................................................................................46
BAYES THEOREM IN PYTHON.....................................................................................................................47
BAYES' THEOREM AND PROBABILITY OF CAUSES............................................................................47
DATA: CASE STUDY................................................................................................................................47
ANALYSIS..................................................................................................................................................48
PROCEDURE..............................................................................................................................................49
FUNCTIONS...................................................................................................................................................54
COMPLEMENTARY BIBLIOGRAPHY.......................................................................................................54
ANNEX I: COMPLEX CALCULATIONS............................................................................................................60
POPULATION AND SAMPLING STATISTICS: CALCULATION OF.........................................................60
VARIANCE AND STANDARD DEVIATION.................................................................................................60
SCALAR PRODUCT OF TWO VECTORS......................................................................................................61
RELATIVE, ABSOLUTE AND CUMULATIVE FREQUENCY CALCULATIONS.....................................61
ANNEX II: CREATION OF A MENU OF OPTIONS..........................................................................................63

-4-

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

VARIABLE MANIPULATION METHODS

In Python, every variable is considered an object. Different types of actions called methods can be performed on

each object. Methods are functions but they are derived from a variable. Therefore, these functions are accessed

using the syntax:

variable.function()

In some cases, these methods (functions of an object) will accept parameters like any other function.

variable.function(parameter)

STRING MANIPULATION
The main methods that can be applied to a text string, organized by category, are described below.

FORMATTING METHODS

CAPITALIZE THE FIRST LETTER

Method: capitalize()
Returns: a copy of the string with the first letter capitalized
> >> string = "welcome to my application".
> >> result = string.capitalize()
> >> result
Welcome to my application

CONVERT A STRING TO LOWERCASE

Method: lower()
Returns: a copy of the string in lowercase letters
> >> string = "Hello World".
> >> string.lower()

-5-

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

hello world

CONVERT A STRING TO UPPERCASE

Method: upper()
Returns: a copy of the string in uppercase letters
> >> string = "Hello World".
> >> string.upper()
HELLO WORLD

CONVERT UPPERCASE TO LOWERCASE AND VICE VERSA

Method: swapcase()
Returns: a copy of the string converted from uppercase to lowercase and vice versa.
> >> string = "Hello World".
> >> string.swapcase()
hOLA mUNDO

CONVERT A STRING TO TITLE FORMAT

Method: title()
Returns: a copy of the converted string
> >> string = "hello world
> >> string.title()
Hello World

CENTER A TEXT
Method: center(length[, "fill character"])
Returns: a copy of the centered string
> >> string = "welcome to my application".capitalize()
> >> string.center(50, "=")
===========Welcome to my application============

> >> string.center(50, " ")

Welcome to my application

ALIGN TEXT TO THE LEFT

Method: ljust(length[, "fill character"])
Returns: a copy of the left-aligned string
> >> string = "welcome to my application".capitalize()
>>> string.ljust(50, "=")
Welcome to my application=======================

-6-

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

ALIGN TEXT TO THE RIGHT

Method: rjust(length[, "fill character"])
Returns: a copy of the right-aligned string
>>> string = "welcome to my application".capitalize()
>>> string.rjust(50, "=")
=======================Welcome to my application

>>> string.rjust(50, " ")

Welcome to my application

FILL IN A TEXT BY PREFIXING IT WITH ZEROS

Method: zfill(length)
Returns: a copy of the string padded with leading zeros until the specified final length is reached
>>> invoice_number = 1575
>>> str(invoice_number).zfill(12)
000000001575

RESEARCH METHODS

COUNT NUMBER OF OCCURRENCES OF A SUBSTRING

Method: count("substring"[, start_position, end_position])
Returns: an integer representing the number of occurrences of substring within string
>>> string = "welcome to my application".capitalize()
>>> string.count("a")
3

SEARCH FOR A SUBSTRING WITHIN A STRING

Method: find("substring"[, start_position, end_position])
Returns: an integer representing the position where the substring starts within
chain. If not found, returns -1
>>> string = "welcome to my application".capitalize()
>>> string.find("my")
13
>>> string.find("my", 0, 10)
-1

-7-

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

VALIDATION METHODS

TO KNOW IF A STRING BEGINS WITH A GIVEN SUBSTRING

Method: startswith("substring"[, start_position, end_position])
Returns: True or False
> >> string = "welcome to my application".capitalize()
> >> string.startswith("Welcome")
True
> >> string.startswith("application")
False
> >> string.startswith("application", 16)
True

TO KNOW IF A STRING ENDS WITH A GIVEN SUBSTRING

Method: endswith("substring"[, start_position, end_position])
Returns: True or False
> >> string = "welcome to my application".capitalize()
> >> string.endswith("application")
True
> >> string.endswith("Welcome")
False
> >> string.endswith("Welcome", 0, 10)
True

TO KNOW IF A STRING IS ALPHANUMERIC

Method: isalnum()
Returns: True or False
> >> string = "pepegrillo 75".
> >> string.isalnum()
False
> >> string = "pepegrillo" >> string = "pepegrillo" >> string = "pepegrillo" >> string = "pepegrillo
> >> string.isalnum()
True
> >> string = "pepegrillo75".
> >> string.isalnum()
True

TO KNOW IF A STRING IS ALPHABETIC

Method: isalpha()
Returns: True or False
> >> string = "pepegrillo 75".
> >> string.isalpha()
False

-8-

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

> >> string = "pepegrillo" >> string = "pepegrillo" >> string = "pepegrillo" >> string = "pepegrillo
> >> string.isalpha()
True
> >> string = "pepegrillo75".
> >> string.isalpha()
False

TO KNOW IF A STRING IS NUMERIC

Method: isdigit()
Returns: True or False
> >> string = "pepegrillo 75".
> >> string.isdigit()
False
> >> string = "7584"
> >> string.isdigit()
True
> >> string = "75 84"
> >> string.isdigit()
False
> >> string = "75.84"
> >> string.isdigit()
False

TO KNOW IF A STRING CONTAINS ONLY LOWERCASE LETTERS

Method: islower()
Returns: True or False
> >> string = "pepe grillo" >> string = "pepe grillo" >> string = "pepe grillo" >> string = "pepe grillo
> >> string.islower()
True
> >> string = "Jiminy Cricket".
> >> string.islower()
False
> >> string = "Pepegrillo" >> string = "Pepegrillo" >> string = "Pepegrillo" >> string = "Pepegrillo
> >> string.islower()
False
> >> string = "pepegrillo75".
> >> string.islower()
True

TO KNOW IF A STRING CONTAINS ONLY UPPERCASE LETTERS

Method: isupper()
Returns: True or False
> >> string = "PEPE GRILLO".
> >> string.isupper()
True
> >> string = "Jiminy Cricket".
> >> string.isupper()
False

-9-

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

> >> string = "Pepegrillo" >> string = "Pepegrillo" >> string = "Pepegrillo" >> string = "Pepegrillo
> >> string.isupper()
False
> >> string = "PEPEGRILLO".
> >> string.isupper()
True

TO KNOW IF A STRING CONTAINS ONLY BLANKS

Method: isspace()
Returns: True or False
> >> string = "pepe grillo" >> string = "pepe grillo" >> string = "pepe grillo" >> string = "pepe grillo
> >> string.isspace()
False
> >> string = " "
> >> string.isspace()
True

TO KNOW IF A STRING HAS A TITLE FORMAT

Method: istitle()
Returns: True or False
> >> string = "Jiminy Cricket".
> >> string.istitle()
True
> >> string = "Jiminy Cricket".
> >> string.istitle()
False

SUBSTITUTION METHODS

FORMATTING A STRING, DYNAMICALLY SUBSTITUTING TEXT

Method: format(*args, **kwargs)
Returns: the formatted string
> >> string = "welcome to my application {0}"
> >> string.format("in Python")
welcome to my Python application

> >> string = "Gross Amount: ${0} + VAT: ${1} = Net Amount: {2}"
> >> string.format(100, 21, 121)
Gross amount: $100 + VAT: $21 = Net amount: 121

> >> string = "Gross amount: ${gross} + VAT: ${VAT} = Net amount: {net}"
> >> string.format(gross=100, vat=21, net=121)
Gross amount: $100 + VAT: $21 = Net amount: 121

> >> string.format(gross=100, vat=100 * 21 / 100, net=100 * 21 / 100 + 100)

Gross amount: $100 + VAT: $21 = Net amount: 121

- 10 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

REPLACE TEXT IN A STRING

Method: replace("substring to search for", "substring to replace with")
Returns: the replaced string
> >> search = "first name last name
> >> replace_by = "John Smith".
> >> "Dear Mr. first name last name:".replace(search, replace_by) Dear Mr. John Smith:

REMOVE CHARACTERS TO THE LEFT AND RIGHT OF A STRING

Method: strip(["character"])
Returns: the substituted string
> >> string = " www.eugeniabahit.com "
> >> string.strip()
www.eugeniabahit.com
> >> string.strip(' ')
www.eugeniabahit.com

REMOVE CHARACTERS TO THE LEFT OF A STRING

Method: lstrip(["character"])
Returns: the substituted string
> >> string ="www.eugeniabahit.com"
> >> string.lstrip("w." )
eugeniabahit.com

> >> string = " www.eugeniabahit.com"

> >> string.lstrip()
www.eugeniabahit.com

REMOVE CHARACTERS TO THE RIGHT OF A STRING

Method: rstrip(["character"])
Returns: the substituted string
> >> string ="www.eugeniabahit.com "
> >> string.rstrip( )
www.eugeniabahit.com

JOINING AND SPLITTING METHODS

ITERATIVELY JOIN A CHAIN

Method: join(iterable)
Returns: the string joined with the iterable (the string is separated by each of the elements of the iterable).
>>> format_invoice_number = ("No. 0000-0", "-0000 (ID: ", ")")

- 11 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

>>> number = "275"

> >> invoice_number = number.join(invoice_number_format)
> >> invoice_number
NO. 0000-0275-0000 (ID: 275)

SPLITTING A STRING INTO THREE PARTS, USING A SEPARATOR

Method: partition("separator")
Returns: a tuple of three elements where the first is the contents of the string before the separator, the second is
the separator itself and the third is the contents of the string after the separator.
> >> tuple = "http://www.eugeniabahit.com".partition("www.")
> >> tuple
('http://', 'www.', 'eugeniabahit.com')

> >> protocol, separator, domain = tuple

>>>> "Protocol: {0}"protocol, domain: {1}".format(protocol, domain) Protocol: http://
Domain: eugeniabahit.com

SPLITTING A STRING INTO SEVERAL PARTS, USING A SEPARATOR

Method: split("separator")
Returns: a list of all elements found by dividing the string by a separator
>>> keywords = "python, guide, course, tutorial".split(", ")
> >> keywords
['python', 'guide', 'course', 'tutorial' ]

SPLIT A STRING INTO LINES

Method: splitlines()
Returns: a list where each element is a fraction of the string divided into lines.
>>> text = """Line 1
Line 2
Line 3
Line 4 """
> >> text.splitlines()
['Line 1', 'Line 2', 'Line 3', 'Line 4'].

> >> text = "Line 1 Line 2 Line 3".

> >> text.splitlines()
['Line 1', 'Line 2', 'Line 3'].

- 12 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
MANIPULATION OF LISTS AND TUPLES
In this chapter, we will see the methods that the list object has. Some of them are also available for
tuples.

AGGREGATION METHODS

ADD AN ITEM TO THE END OF THE LIST

Method: append("new element")
> >> male_names = ["Alvaro", "Jacinto", "Miguel", "Edgardo", "David"]
> >> male_names.append("Jose")
> >> male_names
['Alvaro', 'David', 'Edgardo', 'Jacinto', 'Jose', 'Ricky', 'Jose'].

ADD SEVERAL ITEMS TO THE END OF THE LIST

Method: extend(other_list)
> >> male_names.extend(["Jose", "Gerardo"])
> >> male_names
['Alvaro', 'David', 'Edgardo', 'Jacinto', 'Jose', 'Ricky', 'Jose', 'Jose', 'Jose', 'Gerardo'].

ADD AN ELEMENT IN A GIVEN POSITION

Method: insert(position, "new element")
> >> male_names.insert(0, "Ricky")
> >> male_names
['Ricky', 'Alvaro', 'David', 'Edgardo', 'Jacinto', 'Jose', 'Ricky', 'Jose', 'Jose', 'Gerardo'].

ELIMINATION METHODS

DELETE THE LAST ITEM IN THE LIST

Method: pop()
Returns: the deleted element
> >> male_names.pop()
Gerardo
> >> male_names
['Ricky', 'Alvaro', 'David', 'Edgardo', 'Jacinto', 'Jose', 'Ricky', 'Jose', 'Jose', 'Jose'].

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

DELETE AN ELEMENT BY ITS INDEX

Method: pop(index)
Returns: the deleted element
>>> male_names.pop(3)
Edgardo

>>> male_names
['Ricky', 'Alvaro', 'David', 'Jacinto', 'Jose', 'Ricky', 'Jose', 'Jose', 'Jose'].

DELETE AN ITEM BY ITS VALUE

Method: remove("value")
>>> male_names.remove("Jose")
>>> male_names
['Ricky', 'Alvaro', 'David', 'Jacinto', 'Ricky', 'Jose', 'Jose'].

ORDER METHODS

SORT A LIST IN REVERSE (REVERSE ORDER)

Method: reverse()
>>> male_names.reverse()
>>> male_names
['Jose', 'Jose', 'Ricky', 'Jacinto', 'David', 'Alvaro', 'Ricky'].

SORT A LIST IN ASCENDING ORDER

Method: sort()
>>> male_names.sort()
>>> male_names
['Alvaro', 'David', 'Jacinto', 'Jose', 'Jose', 'Ricky', 'Ricky'].

SORT A LIST IN DESCENDING ORDER

Method: sort(reverse=True)
>>> male_names.sort(reverse=True)
>>> male_names
['Ricky', 'Ricky', 'Jose', 'Jose', 'Jacinto', 'David', 'Alvaro'].

RESEARCH METHODS

COUNT NUMBER OF OCCURRENCES ELEMENTS

Method: count(element)
>>> male_names = ["Alvaro", "Miguel", "Edgardo", "David", "Miguel"].

- 15 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

>>> male_names.count("Miguel") 2 >>> male_names = ("Alvaro", "Miguel", "Edgardo", "David", "Miguel")

>>> male_names.count("Miguel")
2

GET INDEX NUMBER

Method: index(element[, start_index, end_index])
>>> male_names.index("Miguel") 1

>>> male_names.index("Miguel", 2, 5) 4

ANNEX ON LISTS AND TUPLES

TYPE CONVERSION
In the set of Python built-in functions, it is possible to find two functions that allow you to convert lists into

tuples, and vice versa. These functions are list and tuple, to convert tuples to lists and lists to tuples, respectively.

One of the most frequent uses is the conversion of tuples to lists, which need to be modified. This is often the

case with results obtained from a database query.

>>> tuple = (1, 2, 3, 4)

>>> tuple (1, 2, 3, 4)

>>> list(tuple)
[1, 2, 3, 4]

>>> list = [1, 2, 3, 4].

>>> list [1, 2, 3, 4].

>>> tuple(list)
(1, 2, 3, 4)

- 16 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

CONCATENATION OF COLLECTIONS
You can concatenate (or join) two or more lists or two or more tuples, by means of the addition sign +.

You cannot join a list to a tuple. The collections to be joined must be of the same type.

>>> list1 = [1, 2, 3, 4].

> >> list2 = [3, 4, 5, 6, 7, 8]
> >> list3 = list1 + list2
> >> list3
[1, 2, 3, 4, 3, 4, 5, 6, 7, 8]

VARIABLE MANIPULATION METHODS 5

STRING MANIPULATION 5
FORMATTING METHODS 5
CAPITALIZE THE FIRST LETTER 5
CONVERT A STRING TO LOWERCASE 5
CONVERT A STRING TO UPPERCASE 6
CONVERT UPPERCASE TO LOWERCASE AND VICE VERSA 6
CONVERT A STRING TO TITLE FORMAT 6
CENTER A TEXT 6
ALIGN TEXT TO THE LEFT 6
ALIGN TEXT TO THE RIGHT 7
FILL IN A TEXT BY PREFIXING IT WITH ZEROS 7
RESEARCH METHODS 7
COUNT NUMBER OF OCCURRENCES OF A SUBSTRING 7
SEARCH FOR A SUBSTRING WITHIN A STRING 7
VALIDATION METHODS 8
TO KNOW IF A STRING BEGINS WITH A GIVEN SUBSTRING 8
TO KNOW IF A STRING ENDS WITH A GIVEN SUBSTRING 8
TO KNOW IF A STRING IS ALPHANUMERIC 8
TO KNOW IF A STRING IS ALPHABETIC 8
TO KNOW IF A STRING IS NUMERIC 9
TO KNOW IF A STRING CONTAINS ONLY LOWERCASE LETTERS 9
TO KNOW IF A STRING CONTAINS ONLY UPPERCASE LETTERS 9
TO KNOW IF A STRING CONTAINS ONLY BLANKS 10
TO KNOW IF A STRING HAS A TITLE FORMAT 10
SUBSTITUTION METHODS 10
FORMATTING A STRING, DYNAMICALLY SUBSTITUTING TEXT 10
REPLACE TEXT IN A STRING 11
REMOVE CHARACTERS TO THE LEFT AND RIGHT OF A STRING 11

- 17 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

REMOVE CHARACTERS TO THE LEFT OF A STRING 11

REMOVE CHARACTERS TO THE RIGHT OF A STRING 11
JOINING AND SPLITTING METHODS 11
ITERATIVELY JOIN A CHAIN 11
SPLITTING A STRING INTO THREE PARTS, USING A SEPARATOR 12
SPLITTING A STRING INTO SEVERAL PARTS, USING A SEPARATOR 12
SPLIT A STRING INTO LINES 12
MANIPULATION OF LISTS AND TUPLES 14
AGGREGATION METHODS 14
ADD AN ITEM TO THE END OF THE LIST 14
ADD SEVERAL ITEMS TO THE END OF THE LIST 14
ADD AN ELEMENT IN A GIVEN POSITION 14
ELIMINATION METHODS 14
DELETE THE LAST ITEM IN THE LIST 14
DELETE AN ELEMENT BY ITS INDEX 15
DELETE AN ITEM BY ITS VALUE 15
ORDER METHODS 15
SORT A LIST IN REVERSE (REVERSE ORDER) 15
SORT A LIST IN ASCENDING ORDER 15
SORT A LIST IN DESCENDING ORDER 15
RESEARCH METHODS 15
COUNT NUMBER OF OCCURRENCES ELEMENTS 15
GET INDEX NUMBER 16
ANNEX ON LISTS AND TUPLES 16
TYPE CONVERSION 16
CONCATENATION OF COLLECTIONS 17
MAXIMUM AND MINIMUM VALUE 20
COUNT ITEMS 20
DICTIONARY MANIPULATION 22
ELIMINATION METHODS 22
EMPTY A DICTIONARY 22
AGGREGATION AND CREATION METHODS 22
COPY A DICTIONARY 22
CREATE A NEW DICTIONARY FROM THE KEYS OF AN EXISTING DICTIONARY 23
SEQUENCE 23
CONCATENATE DICTIONARIES 23
SET A DEFAULT KEY AND VALUE 23

- 18 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

RETURN METHODS 24
GET THE VALUE OF A KEY 24
TO KNOW IF A KEY EXISTS IN THE DICTIONARY 24
OBTAIN THE KEYS AND VALUES OF A DICTIONARY 24
OBTAIN THE KEYS TO A DICTIONARY 24
GET THE VALUES OF A DICTIONARY 25
OBTAIN THE NUMBER OF ITEMS IN A DICTIONARY 25
FILE HANDLING AND MANIPULATION 27
WAYS TO OPEN A FILE 27
SOME METHODS OF THE FILE OBJECT 29
CSV FILE HANDLING 30
SOME EXAMPLES OF CSV FILES 30
WORKING WITH CSV FILES FROM PYTHON 32
READING CSV FILES 32
WRITING CSV FILES 37
PROBABILITY AND STATISTICS WITH PYTHON 40
PROBABILITY OF MUTUALLY EXCLUSIVE SIMPLE AND COMPOUND EVENTS IN PYTHON 40
SAMPLE SPACE 40
SIMPLE AND COMPOUND EVENTS 40
PROBABILITY ASSIGNMENT 41
SIMPLE MUTUALLY EXCLUSIVE EVENTS 41
EVENTS COMPOSED OF MUTUALLY EXCLUSIVE SIMPLE EVENTS 42
FUNCTIONS 43
CONDITIONAL PROBABILITY IN PYTHON 43
FUNCTIONS 44
DEPENDENT EVENTS 44
SET THEORY IN PYTHON 46
INDEPENDENT EVENTS 46
BAYES THEOREM IN PYTHON 47
BAYES' THEOREM AND PROBABILITY OF CAUSES 47
DATA: CASE STUDY 47
ANALYSIS 48
PROCEDURE 49
FUNCTIONS 54
COMPLEMENTARY BIBLIOGRAPHY 54
ANNEX I: COMPLEX CALCULATIONS 60
POPULATION AND SAMPLING STATISTICS: CALCULATION OF 60

- 19 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

VARIANCE AND STANDARD DEVIATION 60

SCALAR PRODUCT OF TWO VECTORS 61
RELATIVE, ABSOLUTE AND CUMULATIVE FREQUENCY CALCULATIONS 61
ANNEX II: CREATION OF A MENU OF OPTIONS 63
>
> >> tuple4 = tuple1 + tuple2 + tuple3
> >> tuple4
(1, 2, 3, 4, 5, 4, 6, 8, 10, 3, 5, 7, 9)

MAXIMUM AND MINIMUM VALUE

The maximum and minimum value of both lists and tuples can be obtained:

> >> max(tuple4)

10
> >> max(tuple1)
5
> >> min(tuple1)
1
> >> max(list3)
8
> >> min(list1)
1

COUNT ITEMS
The len() function is used to count elements in a list or tuple, as well as characters in a text string:

> >> len(list3)

10
> >> len(list1)
4

- 20 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
DICTIONARY MANIPULATION
ELIMINATION METHODS

EMPTY A DICTIONARY
Method: clear()
>>> dictionary = {"color": "violet", "size": "XS", "price": 174.25}
> >> dictionary
{'color': 'violet', 'price': 174.25, 'size': 'XS'}

> >> dictionary.clear()

> >> dictionary
{}

AGGREGATION AND CREATION METHODS

COPY A DICTIONARY
Method: copy()
> >> dictionary = {"color": "violet",
> >> t-shirt = dictionary.copy()
> >> dictionary
{'color': 'violet', 'price': 174.25,

> >> t-shirt {'color': 'violet', 'price':

> >> dictionary.clear()

"size": "XS", "price": 174.25}
> >> dictionary {}

> >> t-shirt

'size': 'XS'}
{'color': 'violet', 'price':

> >> musculosa = T-shirt

> >> t-shirt 174.25, 'size': 'XS'}
{'color': 'violet', 'price':

> >> muscled {'color': 'violet', 'price':

>>> remera.clear()
>>> T-shirt {} >>> T-shirt {}
174.25, 'size': 'XS'}

174.25, 'size': 'XS'}

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

CREATE A NEW DICTIONARY FROM THE KEYS OF AN EXISTING DICTIONARY

SEQUENCE
Method: dict.fromkeys(sequence[, default value])
>>> sequence = ["color", "size", "brand"]
>>> dictionary1 = dict.fromkeys(sequence)
>>> dictionary1
{'color': None, 'brand': None, 'size': None}

>>> dictionary2 = dict.fromkeys(sequence, 'default value')

>>> dictionary2
{'color': 'default x value', 'brand': 'default x value', 'size': 'default x value'}

CONCATENATE DICTIONARIES
Method: update(dictionary)
>>> dictionary1 = {"color": "green", "price": 45}
>>> dictionary2 = {"size": "M", "brand": "Lacoste"}
>>> dictionary1.update(dictionary2)
>>> dictionary1
{'color': 'green', 'price': 45, 'brand': 'Lacoste', 'size': 'M'}

SET A DEFAULT KEY AND VALUE

Method: setdefault("key"[, None|default_value])

If the key does not exist, it creates it with the default value. Always returns the value for the key passed as
parameter.

>>> t-shirt = {"color": "pink", "brand": "Zara"}

>>> key = remera.setdefault("talle", "U")
> >> key
'U'

> >> t-shirt

{ 'color': 'pink', 'brand': 'Zara', 'size': 'U'}

> >> t-shirt2 = t-shirt.copy()

> >> t-shirt2
{ 'color': 'pink', 'brand': 'Zara', 'size': 'U'}

> >> key = remera2.setdefault("estampado")

> >> key
> >> t-shirt2
{'color': 'pink', 'print': None, 'brand': 'Zara', 'size': 'U'}

> >> key = t-shirt2.setdefault("brand", "Lacoste")

> >> key
Zara

- 23 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

>>> t-shirt2
{'color': 'pink', 'print': None, 'brand': 'Zara', 'size': 'U'}

RETURN METHODS

GET THE VALUE OF A KEY

Method: get(key[, "default x value if key does not exist"])
>>> t-shirt.get("color")
'pink

>>> remera.get("stock")
>>> t-shirt.get("stock", "no stock")
'out of stock

TO KNOW IF A KEY EXISTS IN THE DICTIONARY

Method: 'key' in dictionary
> >> exists = 'price' in t-shirt
> >> exists
False

> >> exists = 'color' in t-shirt

> >> exists
True

OBTAIN THE KEYS AND VALUES OF A DICTIONARY

Method: items()
dictionary = {'color': 'pink', 'brand': 'Zara', 'size': 'U'}

for key, value in dictionary.items():

key, value
Output:
('color', 'pink')
('brand', 'Zara')
('size', 'U')

OBTAIN THE KEYS TO A DICTIONARY

Method: keys()
dictionary = {'color': 'pink', 'brand': 'Zara', 'size': 'U'} for key in dictionary.keys():
key
'brand
size
color

Get keys in a list

- 24 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

> >> dictionary = {'color': 'pink', 'brand': 'Zara', 'size': 'U'}

> >> keys = list(dictionary.keys())
> >> keys
['color', 'brand', 'size' ]

GET THE VALUES OF A DICTIONARY

Method: values()
dictionary = {'color': 'pink', 'brand': 'Zara', 'size': 'U'}
for key in dictionary.values():
key
'pink
Zara
'U'

Get values in a list

> >> dictionary = {'color': 'pink', 'brand': 'Zara', 'size': 'U'}
> >> keys = list(dictionary.values())

OBTAIN THE NUMBER OF ITEMS IN A DICTIONARY

To count the elements of a dictionary, as with lists and tuples, the built-in function len() is used.
> >> dictionary = {'color': 'pink', 'brand': 'Zara', 'size': 'U'}
> >> len(dictionary)
3

- 25 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
FILE HANDLING AND MANIPULATION
Python allows you to work on two different levels with respect to the file and directory system.

One of them is through the os module, which facilitates the work with the entire file and directory

system, at the level of the Operating System itself.

The second level is the one that allows working with files by manipulating their reading and writing from the

application or script itself, treating each file as an object.

WAYS TO OPEN A FILE

The way a file is opened is related to the final objective that answers the question"what is this file being opened

for? The answers can be several: to read, to write, or to read and write.

Each time a file is "opened" a pointer is created in memory.

This pointer will position a cursor (or access point) at a specific location in memory (more simply put, it will

position the cursor on a specific byte of the file contents).

This cursor will move within the file as the file is read or written to.

When a file is opened in read mode, the cursor is positioned at byte 0 of the file (i.e. at the beginning of the file).

Once the file has been read, the cursor moves to the final byte of the file (equivalent to the total number of bytes

in the file). The same happens when it is opened in write mode. The cursor will move as you type.

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

When you want to write to the end of a non-null file, the append mode is used. In this way, the file is opened
with the cursor at the end of the file.
The + symbol as a mode suffix adds the opposite mode to the opening mode once the opening action is executed.

For example, the r (read) mode with the suffix + (r+), opens the file for reading, and after reading, returns the

cursor to byte 0.

The following table shows the different ways of opening a file:

Indicator Opening mode Pointer location
r Read only At the beginning of the file
rb Read only in binary mode At the beginning of the file

r+ Reading and writing At the beginning of the file

rb+ Read and write in binary mode At the beginning of the file
Writing only.
Overwrite the file if it exists.
w At the beginning of the file
Create the file if it does not exist.

Write only in binary mode. Overwrite the file

wb if it exists. Create the file if it does not exist. At the beginning of the file

Writing and reading.

w+ Overwrite the file if it exists. At the beginning of the file
Create the file if it does not exist.
Writing and reading in binary mode.
wb+ Overwrite the file if it exists. Create the file if At the beginning of the file
it does not exist.
If the file exists, at the end of the
Added (add content). file.
a Create the file if it does not exist. If the file does not exist, at the
beginning.
If the file exists, at the end of the
Added in binary mode (add content). file.
ab Create the file if it does not exist. If the file does not exist, at the
beginning.

a+ Added (add content) and read. If the file exists, at the end of

- 28 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

this one.
Create the file if it does not exist. If the file does not exist, at the
beginning.
If the file exists, at the end of the
Added (add content) and read in binary file.
ab+ mode. If the file does not exist, at the
Create the file if it does not exist. beginning.

SOME METHODS OF THE FILE OBJECT

The file object, among its methods, has the following ones:

Method Description
Reads the entire contents of a file.
read([bytes]) If the byte length is passed, it will read only the contents up
to the specified length.
readlines() Reads all lines of a file

write(string) Write string inside the file

Sequence will be any iterable whose elements will be
writelines(sequence)
written one per line

ACCESSING FILES THROUGH THE WITH STRUCTURE With the with structure and the
open() function, you can open a file in any mode and work with it, without having to close it or destroy the
pointer, as this is taken care of by the with structure.

Read a file:

with open("file.txt", "r") as file: content = file.read()

Write to a file:

content = """
This will be the content of the new file.
The file will have several lines.

- 29 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

"""

with open("file.txt", "r") as file: file.write(content)

CSV FILE HANDLING

The CSV format derives its name from "comma separated values" , as defined in the RFC 4180. These are plain

text files, intended for massive data storage. It is one of the simplest formats for data analysis. In fact, many non-

free (or free but more complex) file formats are often converted to CSV format to apply complex data science

with various languages.

A CSV file consists of a header that defines column names, and the following rows have the data corresponding

to each column, separated by a comma. However, many other symbols can be used as cell separators. Among

them, the tab and the semicolon are just as frequent as the comma.

SOME EXAMPLES OF CSV FILES

Weather data (separated by ;)

ID;DATA;VV;DV;T;HR;PPT;RS;P
0;2016-03-01 00:00:00;;;9.9;73;;;
1;2016-03-01 00:30:00;;;9.0;67;;;
2;2016-03-01 01:00:00;;;8.3;64;;;
3;2016-03-01 01:30:00;;;8.0;61;;;
4;2016-03-01 02:00:00;;;7.4;62;;;
5;2016-03-01 02:30:00;;;8.3;47;;;
6;2016-03-01 03:00:00;;;7.7;50;;;
7;2016-03-01 03:30:00;;;9.0;39;;;

Scores obtained by players in a tournament (separated by ,) name,number,year

- 30 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

Maria,858,1930
Jose,665,1930
Rosa,591,1930
Juan Carlos,522,1930
Antonio,509,1930
Maria Esther,495,1930
Maria Luisa,470,1930
Joan,453,1930
John,436,1930

Companies registered with the General Inspectorate of Justice of Argentina (separated by , and data in

quotation marks)

"correlative_number", "company_type", "company_type_description", "company_reason_of", "company_name",

"deregistration_code", "deregistration_detail".
"10", "10", "PARTNERSHIP", "A A VALLE Y COMPA¥IA", "S", "42014", "BELONGS TO REGISTER INACTIVE ENTITIES".
"11", "10", "PARTNERSHIP", "A LUCERO Y H CARATOLI", "S", "42014", "BELONGS TO
REGISTRATION OF INACTIVE ENTITIES".
"12", "10", "SOCIEDAD COLECTIVA", "A PUIG E HIJOS", "S", "42014", "PERTENECE A
REGISTRATION OF INACTIVE ENTITIES".
"13", "10", "GENERAL PARTNERSHIP", "A C I C A", "S", "42014", "BELONGS TO REGISTRY
INACTIVE ENTITIES".
"14", "10", "PARTNERSHIP", "A¥ON BEATRIZ S Y CIA", "S", "42014", "BELONGS TO
REGISTRATION OF INACTIVE ENTITIES".
"15", "10", "PARTNERSHIP", "ABA DIESEL", "S", "42014", "BELONGS TO REGISTRY.
INACTIVE ENTITIES".
"16", "10", "PARTNERSHIP", "ABADA L JOSE AND JORGE JOSE
ABADAL", "S", "42014", "BELONGS TO REGISTRY OF INACTIVE ENTITIES", "ABADAL", "S", "42014", "BELONGS TO
REGISTRY OF INACTIVE ENTITIES".
"17", "10", "PARTNERSHIP", "ABADAL JOSE E HIJO", "S", "42014", "BELONGS TO REGISTER OF INACTIVE
ENTITIES".
"18", "10", "SOCIEDAD COLECTIVA", "ABATE Y MACIAS", "S", "42014", "BELONGS TO
REGISTRATION OF INACTIVE ENTITIES".

It is also possible to find data stored in text files (TXT) with formats very similar to what you would expect to

find in a CSV. Sometimes it is possible to develop a formatting script to correct these files to work with a CSV.

Meteorological observations in TXT

DATE TMAX TMIN NAME

--------------------------------------------------------------------
07122017 28.0 19.0 AEROPARQUE AERO
07122017 26.8 12.4 AERO BLUE

- 31 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

07122017 29.6 7.8 BAHIA BLANCA AERO

07122017 22.7 6.7 BARILOCHE AERO
07122017 3.0 -8.5 BELGRANO BASE II
07122017 2.4 -0.2 CARLINI BASE (EX JUBANY)
07122017 3.9 -0.6 BASIS HOPE
07122017 0.7 -3.6 MARAMBIO BASE

WORKING WITH CSV FILES FROM PYTHON

Python provides its own module called csv, which facilitates the parsing of data from CSV files, both for reading

and writing.

This module is used in combination with the with structure and the open function to read or generate the file, and

the CSV module forparsing.

READING CSV FILES

Contents of .csv file

0;2016-03-01 00:00:00;;;9.9;73;;;
1;2016-03-01 00:30:00;;;9.0;67;;;
2;2016-03-01 01:00:00;;;8.3;64;;;
3;2016-03-01 01:30:00;;;8.0;61;;;
4;2016-03-01 02:00:00;;;7.4;62;;;
5;2016-03-01 02:30:00;;;8.3;47;;;
6;2016-03-01 03:00:00;;;7.7;50;;;
7;2016-03-01 03:30:00;;;9.0;39;;;
8;2016-03-01 04:00:00;;;8.7;39;;;

from csv import reader

with open("file.csv", "r") as file: document = reader(file, delimiter=';', for row in document:
' '.join(row) quotechar='"')

Output:

'0 2016-03-01 00:00:00 9.9 73

'1 2016-03-01 00:30:00 9.0 67
'2 2016-03-01 01:00:00 8.3 64
'3 2016-03-01 01:30:00 8.0 61
'4 2016-03-01 02:00:00 7.4 62
'5 2016-03-01 02:30:00 8.3 47
'6 2016-03-01 03:00:00 7.7 50
'7 2016-03-01 03:30:00 9.0 39
'8 2016-03-01 04:00:00 8.7 39

- 32 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

When the CSV file has a header, it is necessary to skip the header:

Contents of .csv file

from csv import reader

with open("file.csv", "r") as file: document = reader(file, delimiter=';', headers = next(document)

for row in document: quotechar='"')
' '.join(row)

Output:

VARIABLE MANIPULATION METHODS 5

- 33 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

TO KNOW IF A STRING IS ALPHANUMERIC 8

TO KNOW IF A STRING IS ALPHABETIC 8
TO KNOW IF A STRING IS NUMERIC 9
TO KNOW IF A STRING CONTAINS ONLY LOWERCASE LETTERS 9
TO KNOW IF A STRING CONTAINS ONLY UPPERCASE LETTERS 9
TO KNOW IF A STRING CONTAINS ONLY BLANKS 10
TO KNOW IF A STRING HAS A TITLE FORMAT 10
SUBSTITUTION METHODS 10
FORMATTING A STRING, DYNAMICALLY SUBSTITUTING TEXT 10
REPLACE TEXT IN A STRING 11
REMOVE CHARACTERS TO THE LEFT AND RIGHT OF A STRING 11
REMOVE CHARACTERS TO THE LEFT OF A STRING 11
REMOVE CHARACTERS TO THE RIGHT OF A STRING 11
JOINING AND SPLITTING METHODS 11
ITERATIVELY JOIN A CHAIN 11
SPLITTING A STRING INTO THREE PARTS, USING A SEPARATOR 12
SPLITTING A STRING INTO SEVERAL PARTS, USING A SEPARATOR 12
SPLIT A STRING INTO LINES 12
MANIPULATION OF LISTS AND TUPLES 14
AGGREGATION METHODS 14
ADD AN ITEM TO THE END OF THE LIST 14
ADD SEVERAL ITEMS TO THE END OF THE LIST 14
ADD AN ELEMENT IN A GIVEN POSITION 14
ELIMINATION METHODS 14
DELETE THE LAST ITEM IN THE LIST 14
DELETE AN ELEMENT BY ITS INDEX 15
DELETE AN ITEM BY ITS VALUE 15
ORDER METHODS 15
SORT A LIST IN REVERSE (REVERSE ORDER) 15
SORT A LIST IN ASCENDING ORDER 15
SORT A LIST IN DESCENDING ORDER 15
RESEARCH METHODS 15
COUNT NUMBER OF OCCURRENCES ELEMENTS 15
GET INDEX NUMBER 16
ANNEX ON LISTS AND TUPLES 16

- 34 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

TYPE CONVERSION 16
CONCATENATION OF COLLECTIONS 17
MAXIMUM AND MINIMUM VALUE 20
COUNT ITEMS 20
DICTIONARY MANIPULATION 22
ELIMINATION METHODS 22
EMPTY A DICTIONARY 22
AGGREGATION AND CREATION METHODS 22
COPY A DICTIONARY 22
CREATE A NEW DICTIONARY FROM THE KEYS OF AN EXISTING DICTIONARY 23
SEQUENCE 23
CONCATENATE DICTIONARIES 23
SET A DEFAULT KEY AND VALUE 23
RETURN METHODS 24
GET THE VALUE OF A KEY 24
TO KNOW IF A KEY EXISTS IN THE DICTIONARY 24
OBTAIN THE KEYS AND VALUES OF A DICTIONARY 24
OBTAIN THE KEYS TO A DICTIONARY 24
GET THE VALUES OF A DICTIONARY 25
OBTAIN THE NUMBER OF ITEMS IN A DICTIONARY 25
FILE HANDLING AND MANIPULATION 27
WAYS TO OPEN A FILE 27
SOME METHODS OF THE FILE OBJECT 29
CSV FILE HANDLING 30
SOME EXAMPLES OF CSV FILES 30
WORKING WITH CSV FILES FROM PYTHON 32
READING CSV FILES 32
WRITING CSV FILES 37
PROBABILITY AND STATISTICS WITH PYTHON 40
PROBABILITY OF MUTUALLY EXCLUSIVE SIMPLE AND COMPOUND EVENTS IN
PYTHON 40
SAMPLE SPACE 40
SIMPLE AND COMPOUND EVENTS 40
PROBABILITY ASSIGNMENT 41
SIMPLE MUTUALLY EXCLUSIVE EVENTS 41

- 35 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

EVENTS COMPOSED OF MUTUALLY EXCLUSIVE SIMPLE EVENTS 42

FUNCTIONS 43
CONDITIONAL PROBABILITY IN PYTHON 43
FUNCTIONS 44
DEPENDENT EVENTS 44
SET THEORY IN PYTHON 46
INDEPENDENT EVENTS 46
BAYES THEOREM IN PYTHON 47
BAYES' THEOREM AND PROBABILITY OF CAUSES 47
DATA: CASE STUDY 47
ANALYSIS 48
PROCEDURE 49
FUNCTIONS 54
COMPLEMENTARY BIBLIOGRAPHY 54
ANNEX I: COMPLEX CALCULATIONS 60
POPULATION AND SAMPLING STATISTICS: CALCULATION OF 60
VARIANCE AND STANDARD DEVIATION 60
SCALAR PRODUCT OF TWO VECTORS 61
RELATIVE, ABSOLUTE AND CUMULATIVE FREQUENCY CALCULATIONS 61
ANNEX II: CREATION OF A MENU OF OPTIONS 63

Another way to read CSV files with headers is to use the DictReader object instead of the reader,

and thus access only the value of the desired columns by name:

from csv import DictReader

with open("file.csv", "r") as file: document = DictReader(file, delimiter=';', for row in document:
row['DATA']] quotechar='"')

- 36 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

Output:

'2016-03-01 00:00:00'
'2016-03-01 00:30:00'
'2016-03-01 01:00:00'
'2016-03-01 01:30:00'
'2016-03-01 02:00:00'
'2016-03-01 02:30:00'
'2016-03-01 03:00:00'
'2016-03-01 03:30:00'
'2016-03-01 04:00:00'

WRITING CSV FILES

Writing a CSV without header:

from csv import writer with open("data.csv", "w") as file:

document = writer(file, delimiter=';', quotechar='"') document.writerows(array)

In the above example, an array could be a list of lists with equal number of elements. For example:
matrix = [

['John', 373, 1970],

['Ana', 124, 1983],
['Pedro', 901, 1650],
['Rosa', 300, 2000],
['Juana', 75, 1975],
]

This would generate a file named data.csv with the following content:

eugenia@bella:~$ cat datos.csv

John;373;1970
Ana;124;1983
Peter;901;1650
Rose;300;2000
Joan;75;1975

- 37 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

Writing a CSV with header:

In this case, the matrix to be written will need to be a list of dictionaries whose keys match the

indicated headers.

matrix = [
dict(player='Juan', points=373, year=1970), dict(player='Ana', points=124, year=1983),
dict(player='Pedro', points=901, year=1650), dict(player='Rosa', points=300, year=2000), dict(player='Juana',
points=75, year=1975), ] from csv import DictWriter

headers = ['player', 'points', 'year'] with open("data.csv", "w") as file:

document = DictWriter(file, delimiter=';', fieldnames=headers) quotechar='"',
document.writeheader()
document.writerows(matrix)

Simple statistical functions

Simple statistical functions such as the following can be performed on lists and tuples obtained or

not from a CSV:

Counting elements len(collection)

Add elements sum(collection)
Get higher number max(collection)
Get smaller number min(collection)

- 38 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
PROBABILITY AND STATISTICS WITH PYTHON
PROBABILITY OF MUTUALLY EXCLUSIVE SIMPLE AND COMPOUND
EVENTS IN PYTHON
SAMPLE SPACE
A sample space is a set of possible events, such as those that could result from rolling a die:

E=(1,2,3,4,5,6)
sample_space = [1, 2, 3, 4, 5, 6].

Each element in a sample space is referred to as a sample point . The number of sample points is

denoted by n such that for sample spaceE=11,2,3,4,5,6/

, n=6 .

n = len(monthly_space)

SIMPLE AND COMPOUND EVENTS

An event is a set of outcomes within a sample space. For example:

• the rolling of a die is an event

• the probability that the number 5 comes out in this throw, is a simple event A = {5} and is
exclusive: if 5 comes out, no other number can simultaneously come out.

• the probability that an odd number is thrown, is the composite eventB=11,3,5}

which will depend in turn on the events of the
simple exclusive ^ = {1} , B2 = {2} and B3 = {3}

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

PROBABILITY ASSIGNMENT
Probability assignment is that which provides mathematical models to calculate the chances of

specific events occurring or not occurring.

The probability of an event is denoted by P( event)

The events can be:

• simple or compound
• mutually exclusive or independent

SIMPLE MUTUALLY EXCLUSIVE EVENTS

If we consider a sample space A , each of the k sample points will be denoted by Ak and the

probability of these, designated as P(Ak) , will be determined by:

P(A,) = -
n

probability = 1.0 / n

In Python, at least one element of the equation is required to be a real number if what is
required as a result is a real number.

The probability of each sample point, as mutually exclusive events, is the same for each event.

P^) = P(5) = P(4) = P(3) = P(2) = P(1)

- 41 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

EVENTS COMPOSED OF MUTUALLY EXCLUSIVE SIMPLE EVENTS

When the simple events that make up the composite event A are mutually exclusive, the probability

of the composite event will be given by the sum of the probabilities of each simple event P(Ak) ,

such that:

P(A) = P(A1)-P(A2)-.-P(Ak)
For example, to estimate the probability that a single throw of a die will produce an even number,

we obtain the event .4 = {2. 4,. 6}

given by the sum of

the probabilities of each of the simple eventsP(2)-P(3)-P(4,of the sample spaceE=11,2,3,4,5,6}

such that:

P(A) = P(2) - P(4) +P(6)

P(A) =1+1+1=8
P(A) - |

3
In the first result 6 ( in the second step, before finding the maximum common

1
divisor [DCM] and reduce the fraction to 2 ) , the denominator is equivalent to the number of single

events within the composite event "even numbers" and is denoted by h. The denominator, 6 , is n,

the total of all events in the sample space. Thus, the probability of an event composed A by

mutually exclusive events is given by the quotient of hyn such that:

p(a) = - n

pair_numbers = [i for i in sample_space if i % 2 is 0] h = len(pair_numbers) probability = float(h) / n

- 42 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

A composite event can be denoted by the union of its simple events (symbol u , read as "o"), such

that:

P(A, u A u ...A,) = P (A, ) + P (A2) + ...A,

For example, for the case of the event "even numbers", it is obtained that:

P(2U4U 6) = P(2)+P(4)+P(6)
P(2U4U6) = 1++1+1 =8
P(2 U 4 U 6) = | |

Such that P(2U4U 6)

is an event and P(2) , P( 4) and P(6) are the probabilities of
the three events that compose it. In a new context, U 4 U 6)
can be
treated as an event A.

FUNCTIONS
# Probability of mutually exclusive simple events pssme = lambda e: 1.0 / len(e) # Probability of mutually
exclusive compound events def pscme(e, sc):
n = len(e)
return len(sc) / float(n)

CONDITIONAL PROBABILITY IN PYTHON

B= {2,4,6}
c. Probability of B: P(B) =$=6=2

d. Probability of intersection:

P(ARB) = P(A)P(B)
P(A n B) = 1 I
P(A n B) = I

e = sample_space = [1, 2, 3, 4, 5, 6] n = len(e) # total of the sample

# probability of A
a = [i for i in e if i % 2 is not 0] pa = len(a) / float(n)

- 43 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

# probability of B
b = [i for i in e if i % 2 is 0] b = [i for i in e if i % 2 is 0] b = [i for i in e if i % 2 is 0
pb = len(b) / float(n)

# probability of the intersection of events pi = pa * pb

FUNCTIONS
# Conditional probability: dependent events def pscd(e, a, b):
i = list(set(a).intersection(b))
pi = pscme(e, i)
pa = pscme(e, a)
return pi / pa

# Conditional probability: independent events def psci(e, a, b):

pa = pscme(e, a)
pb = pscme(e, b)
return pa * pb

DEPENDENT EVENTS
Refers to the probability of two events occurring simultaneously where the second event depends on

the occurrence of the first.

The probability of B occurring if A occurs is denoted by P(BA) and is read as "the probability of B

given A", such that:

P(BI^ _1
■ P(A)

Where PA n B)
is the probability of the intersection of the events of AandB

- defined as: P{A n B) = P(A)P(B-A,

-such that the intersection is a new event

composed of simple events. In the following example, it would equal 11,3} (because 1 and 3 are in

both A and B ).

Example: what is theprobability of rolling a die with an odd number less than 4?

- 44 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

The throwing of the die is an event in itself. We wish to find the probability of B = {1.2.3} (number
less than 4) given that A=(1,3,5 (odd number) occurred in the sample space E = {1.2. 3. 4,5,6} .

sample_space = [1, 2, 3, 4, 5, 6].

a = [i for i in monthly_space if i % 2 is not 0].
b = [i for i in sample_space if i < 4] b = [i for i in sample_space if i < 4] b = [i for i in sample_space if i < 4]

To calculate the probability of an intersection, first the intersection is obtained:

An ={1,3}

intersec = [i for i in a if i in b]

And then, the probability of the new composite event is calculated:

112 1
P(AnB)=P(1)+P(3)=+=é=, b0O3

or, in other words:

poand_1-2_1
n6 3

It is also necessary to obtain the probability of A , taking into account that

is also a compound event:

p_I_2_1
Finally, it is obtained that:

P(B|A) = P427
P(B|.A) = 1/2
P(B-A) =5=0.6

e = sample_space = [1, 2, 3, 4, 5, 6].

a = [i for i in e if i % 2 is not 0] # odd numbers

b = [i for i in e if i < 4] b = [i for i in e if i < 4] b = [i for i in e if i < 4] # numbers less than 4

- 45 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

intersec = [i for i in a if i in b] # intersection of A and B

n = len(e) # total sample

ha = len(a) # total number of single events in A
hintersec = len(intersec) # total number of single events at the intersection

# probability of intersection
probability_intersec = float(hintersec) / n

# probability of 'a
probability_a = float(ha) / n

# conditional probability
probability_b_given_a = probability_intersec / probability_a

SET THEORY IN PYTHON

When obtaining the intersection of two compound events, a manual method has been used by

saying: return 'i' for each 'i' in list 'a' if it is in list 'b'.

However, since each compound event is a set and Python provides a data type called set, it is

possible to obtain the intersection by manipulating compound events as Python sets. With set you

can convert any iterable to a set and perform set operations such as union and intersection when

necessary. intersec = list(set(a).intersection(b))

Here the set obtained is converted into a list in order to be consistent with the rest of the code and to

ensure that the resulting element supports the usual operations and processing of a list. When in

doubt as to whether to use lists or sets, the principle of simplicity should be applied and the simplest

solution should be implemented.

INDEPENDENT EVENTS

Unlike the previous case, here the probability of occurrence of B is not affected by the occurrence of

A . For example, the probability of rolling a die

- 46 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

and obtain an even number (event B) is not affected by the fact that an odd number was obtained in
a previous throw (event A). The probability of B is independent of A and is given by the product of
the probability of both events:
P(AnB) = P(A)P(B)

Here the intersection is the probability of the confluence of both events.

Once the probability of both independent events is calculated, they are multiplied obtaining:

a. Sample space (for both events):

E = {1,2,3,4,5,0}

b. Probability of A:

.4= {1,3,5}
P(A) = h = 2 = 1

BAYES THEOREM IN PYTHON

BAYES' THEOREM AND PROBABILITY OF CAUSES

Given a series of events Ak whose sum total is a sample space E and any event B, Bayes' Theorem

allows us to know the probability that each event Ak of E is the cause of B. For this reason, it is also

known as probability of causes.

DATA: CASE STUDY

Given a city of 50,000 inhabitants, with the following distribution:

- 47 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

Girls Children Women Men

11000 9000 16000 14000

And, a report of 9,000 cases of influenza, distributed as follows:

Girls Boys Women Men

2000 1500 3000 2500

The aim is to obtain the probability that the cause of contracting influenza is the fact of belonging to

a certain demographic sector (for example, the demographic sector made up of boys or girls).

ANALYSIS
From what has been stated above, it follows that:

• The city (absolute total inhabitants) is the sample space E.

• The number of girls, boys, women and men is each of the events Ak of the sample space E

• The value of n is taken as the sum of the sample space 2 Aa , such that
n = 50000

• The value of h for the events Ak is each of the values given in the population distribution
table.

• Having the flu is event B.

• The distribution table of influenza cases corresponds to the intersections of event B with
each event Ak , i.e. each Akn B

Depending on the probability calculation applied, the following can be obtained:

- 48 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

• The probability of being a girl, boy, woman or man in the city, by P(Ak) . It is considered
an a priori probability.

• The probability of being a girl, boy, woman or man and having influenza, which is obtained
with P(Ak B) and is considered a conditional probability.

• The probability that any inhabitant, regardless of the sector to which he or she belongs, will
have the flu is obtained with

n
P(B)=>P(A)P(BA,)
k=1 and is considered a total probability.

• The probability that someone with influenza is a girl, boy, woman or man is obtained with
Bayes' Theorem. This probability is considered an a posteriori probability, allowing us to

answer questions such as: Whatis the probability that a new case of influenza will be in a

child?

An efficient and orderly way to obtain an a posteriori probability with Bayes' Theorem is to first

obtain the three prior probabilities: a priori, conditional and total.

NOTICE:
In the following, map(float, <list>) will be used in the source code to convert the elements of
a list into real numbers, as long as doing so does not overload the code.

PROCEDURE
1. A priori probability calculation

Returns: probability that an inhabitant belongs to a specific demographic sector.

- 49 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

Formula:

Data required:

hk = data from the population distribution table

n = always the total amount of the sample space (50 000)

Results:
m) = = 0.22
50000 probability of begirl

9000
P(A2) = 50000
= 0.18
probability of bechild
16000
= 0.32
50000
probability of bewoman
14000 = 0.28
50000 probability of beman

Python code:

inhabitants = map(float, [11000, 9000, 16000, 14000])

n = sum(inhabitants)
pa = [h / n for h in inhabitants].

2. Conditional probability

Returns: probability of having flu while belonging to a demographic sector.

specific.

Certainty: Ak (demographic sector)

Objective: B (the probability of having the flu)

- 50 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

P(AknB)
P(BAk)= P(A,)
Formula:
Data required:

h=B,
P(Ak n B) =
h = intersections (data from the table of distribution of influenza cases)

Results:

. .....
P(BA1)= WOOL1
0 18 0.22 ' probability of having the flu as a child

IWO
P(BI A) = 50000 = 0.16
0.18
probability of getting the flu as a child
3000
P(BI A3) = woou __0 19
0.32 '
probability of having the flu as a woman
2500
P(BI A) = WOOL __0 18
0.28 ' probability of getting the flu as a man

Python code:

affected = map(float, [2000, 1500, 3000, 2500]) pi = [k / n for k in affected].

pba = [pi[i] / pa[i] for i in range(len(pi))].

3. Total probability

Returns: probability that any of the inhabitants, regardless of the demographic sector to which they

belong, may have influenza.

- 51 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

n
P(B) = >P(A,) P(BA,)
Formula: k=1

Data required:

a priori probability
conditional probability

Results:

P(B) = PA ,) PBA)) PBA)) (])+() PB])

P(B") = 0.22 ■ 0.18-0.18 ■ 0.16+0.32 ■ 0.19 - 0.28 ■ 0.18

PiB] = 0.04 - 0.03 - 0.06 - 0.05

P(B) = 0.18

Python code:

products = [pa[i] * pba[i] for i in range(len(pa))] pb = sum(products)

Remarks:

(a) note that in the above output there will be a difference of .01 with respect to the manual

solution. This is due to the rounding performed in the manual solution. This difference can be

eradicated by using 3 decimal places in the conditional probability values (instead of two) in the

manual solution.

- 52 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

(b) the probability of NOT having the flu will be given by 1 - P(B'l such that
1 -0.18 = 0.82 but it will not be necessary to use it for this example with the
Bayes theorem.

4. A posteriori probability

Returns: probability of belonging to a specific demographic sector and having the flu.

Certainty: B (have flu)

Objective: Ak (the probability of belonging to a specific demographic sector).

_ . 2P(A,)P(BA+)
Formula: k=1

Data required:

PAk) P(BAk,
= the product obtained in each of the terms of total probability

2P(AL)P(B|A)
k=1 = the total probability

Results:

= 0.22
probability of being a girl having the flu

0.03
P(A2B)= 0.18
= 0.16
probability of being a child having the flu

- 53 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

0.06 —
P(A3B)= = 0.33
0.18 probability of being a woman having the flu

0.05 -
P(A4B)= = 0.27
0.18 probability of being a man having the flu

Python code:

pab = [p / pb for p in products].

FUNCTIONS
# Bayes' Theorem
def bayes(e, b):
n = float(sum(e))
pa = [h / n for h in e].
pi = [k / n for k in b].
pba = [pi[i] / pa[i] for i in range(len(pi))].
prods = [pa[i] * pba[i] for i in range(len(pa))]]
ptb = sum(prods)
pab = [p / pb for p in prods].
return pab

COMPLEMENTARY BIBLIOGRAPHY
[0] Probability and Statistics, Murray Spiegel. McGraw-Hill, Mexico 1988. ISBN: 968-451-102-7

- 54 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
ANNEX I: COMPLEX CALCULATIONS
POPULATION AND SAMPLING STATISTICS: CALCULATION OF
VARIANCE AND STANDARD DEVIATION
from math import sqrt

samples = [12, 23, 24, 24, 22, 10, 17] # sample list

n = len(samples)
average = sum(samples) / float(n)

Media

Population variance
2 _ H(,2)2
" n

Sample variance

Sample standard deviation

Population standard deviation

differences = [xi - mean for xi in samples].

powers = [x ** 2 for x in differences].
summation = sum(powers)

monthly_variance = summation / (n - 1) population_variance = summation / n

monthly_deviation = sqrt(monthly_variance) population_deviation = sqrt(population_variance)

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

SCALAR PRODUCT OF TWO VECTORS

vector1 = [3, 0]
vvector2 = [4, 3]
pe = sum([x * y for x, y in zip(vector1, vector2)])

RELATIVE, ABSOLUTE AND CUMULATIVE FREQUENCY

CALCULATIONS
# ABSOLUTE FREQUENCY
# Number of times a value appears in a sample

samples = [1, 2, 3, 4, 4, 3, 2, 6, 7, 3, 3, 3, 1, 8, 5, 9] absolute = []

frequencies = []

for n in samples:
if not n in absolutes:
absolute.append(n)
fi = samples.count(n)
frequencies.append(fi)

N = sum(frequencies) # == len(samples)

# RELATIVE FREQUENCY
# Quotient between absolute frequency and relative N = [float(fi) / N for fi in frequencies] sumarelative =
round(sum(relative)) # == 1

# CUMULATIVE FREQUENCY
# Sum of all frequencies less than or equal to the absolute frequency frequencies.sort()
cumulative = [sum(frequencies[:i+1]) for i, fi in enumerate(frequencies)]]

# CUMULATIVE RELATIVE FREQUENCY

# Ratio between accumulated frequency and total amount of accumulated data = [float(f) / N for f in
accumulated].

- 61 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
ANNEX II: CREATION OF A MENU OF OPTIONS
In scripting, it can be useful to give the user a menu of options and have the script act according to

the option chosen by the user. Here is a trick to solve this in a simple and ingenious way.

1) First, the entire script needs to be organized into functions.

2) Secondly, it is necessary that all functions have their corresponding documentation, defining

what exactly the function does:

def read_file():
"Read CSV file"""""
return "read"

def write_file():
"""Write CSV file"""
return "write"

def _sum_numbers(list):
"""Add the numbers in a list""" return "private""

3) Next, a list is defined with the name of all the functions that will be accessible by the user from

the menu:

functions = ['read_file', ' write_file']]

The trick is to automate both the generation of the menu and the function call.

To automate menu generation, the trick is to use:

▪ The list in step 3

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

▪ The locals() function.

▪ The doc attribute

number = 1 # will then be used to access the function

menu = "Choose an option".

for function in functions:

menu += "\t{}. {}".format(number, locals()[function].__doc__) number = number + 1 # increments the
number in each iteration

echo(menu)
option = int(get("Your option: "))
# echo and get: hacks learned in the introductory course

Finally, to dynamically access the function chosen by the user, the trick is to use the option chosen

by the user, as an index to access the function name from the list, and again resort to locals to invoke

the function:

function = functions[option - 1] locals() # you get the name of the function

[function]() # the function is invoked by locals()

- 64 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
CERTIFICATE

Show how much you have learned!

If you reached the end of the course you can obtain a triple certification:

- Certificate of attendance (issued by Eugenia Bahit School )

- Certificate of Achievement (issued by CLA Linux)
- Approval certification (issued by LAECI)

Check with your teacher or visit the certification website at http://python.eugeniabahit.org.

If you need to prepare for your exam, you can register in the
Data Science with Python course at
Escuela de Informática Eugenia Bahit
www.eugeniabahit.com www.eugeniabahit.com

Itil 4 Foundation Cheat Sheet
100% (4)
Itil 4 Foundation Cheat Sheet
4 pages
ASERA54 Conference Program
No ratings yet
ASERA54 Conference Program
86 pages
Final 3
No ratings yet
Final 3
70 pages
Your Intermediate Guide To SQL
No ratings yet
Your Intermediate Guide To SQL
20 pages
Sta4setup$ PDF
No ratings yet
Sta4setup$ PDF
8 pages
HTTP Versus HTTPS
No ratings yet
HTTP Versus HTTPS
18 pages
Course Outline: Comptia A+: Audience Profile
No ratings yet
Course Outline: Comptia A+: Audience Profile
4 pages
AZ900-Microsoft Azure Fundamentals
No ratings yet
AZ900-Microsoft Azure Fundamentals
5 pages
PL 400 4
No ratings yet
PL 400 4
33 pages
ISTQB Dumps and Mock Tests For Foundation Level Paper 16 PDF
No ratings yet
ISTQB Dumps and Mock Tests For Foundation Level Paper 16 PDF
7 pages
HTML Codes Chart
No ratings yet
HTML Codes Chart
12 pages
Sy0 601 03
No ratings yet
Sy0 601 03
46 pages
Module 3 - Breaking The Monolith - Containers
No ratings yet
Module 3 - Breaking The Monolith - Containers
43 pages
How To Pass The Google Certification Exams: Jirayut Nimsaeng (Dear)
No ratings yet
How To Pass The Google Certification Exams: Jirayut Nimsaeng (Dear)
33 pages
E. FERGUSON, Kathy - Feminist Theory Today
No ratings yet
E. FERGUSON, Kathy - Feminist Theory Today
21 pages
Hadoop & Big Data
No ratings yet
Hadoop & Big Data
36 pages
NetBackup Commands Window
No ratings yet
NetBackup Commands Window
517 pages
Az 500 Manage Security Operations 15 20 V1.0i
No ratings yet
Az 500 Manage Security Operations 15 20 V1.0i
51 pages
Linux Essentials: Bootdisks and The Boot Process
No ratings yet
Linux Essentials: Bootdisks and The Boot Process
30 pages
SENG419-python 98745
No ratings yet
SENG419-python 98745
103 pages
MIE CompTIA Cert Path Professional Skills
No ratings yet
MIE CompTIA Cert Path Professional Skills
4 pages
Chapter1 Junos OS Fundamentals
No ratings yet
Chapter1 Junos OS Fundamentals
8 pages
SSCP Exam Outline Sept17
No ratings yet
SSCP Exam Outline Sept17
13 pages
Caltech - AI & ML Updated-1333
No ratings yet
Caltech - AI & ML Updated-1333
30 pages
Data Mining N Business Intelligence
No ratings yet
Data Mining N Business Intelligence
63 pages
MCDBA-MCAD-MCSE-70-229-Microsoft SQL Server 2000 Database Design and Implementation
100% (2)
MCDBA-MCAD-MCSE-70-229-Microsoft SQL Server 2000 Database Design and Implementation
655 pages
Node JS Reference Card
No ratings yet
Node JS Reference Card
8 pages
Introduction To The Web Design Course
No ratings yet
Introduction To The Web Design Course
205 pages
New Web Designing Course Syllabus
No ratings yet
New Web Designing Course Syllabus
2 pages
220 1001 Exam Objectives
No ratings yet
220 1001 Exam Objectives
19 pages
AZ 900 Official Course Study Guide v2.0
No ratings yet
AZ 900 Official Course Study Guide v2.0
38 pages
Post Graduate Diploma in ML & AI - IIITB
No ratings yet
Post Graduate Diploma in ML & AI - IIITB
141 pages
C Programming
No ratings yet
C Programming
25 pages
Az-305 8
No ratings yet
Az-305 8
50 pages
Code Commands and Reference Links PDF
No ratings yet
Code Commands and Reference Links PDF
37 pages
Database Administration Fundamentals - MTA Exam 98-364 - Microsoft
50% (2)
Database Administration Fundamentals - MTA Exam 98-364 - Microsoft
5 pages
Fundamentals of Database Systems: Lesson 1: Introduction
No ratings yet
Fundamentals of Database Systems: Lesson 1: Introduction
35 pages
Course Presentation GoogleCloudDigitalLeader
No ratings yet
Course Presentation GoogleCloudDigitalLeader
182 pages
Aws Certification Map en Us PDF
No ratings yet
Aws Certification Map en Us PDF
1 page
pl-200 7
No ratings yet
pl-200 7
33 pages
AI-900 Exam Study Guide Even
No ratings yet
AI-900 Exam Study Guide Even
4 pages
AWS Sysops Administrator Syllabus
No ratings yet
AWS Sysops Administrator Syllabus
4 pages
Support Field Service
No ratings yet
Support Field Service
704 pages
AWS Certified Solutions Architect Professional - Exam Guide
No ratings yet
AWS Certified Solutions Architect Professional - Exam Guide
22 pages
What Is Application Programming Interfaces
No ratings yet
What Is Application Programming Interfaces
8 pages
9 - CT071-3-3-DDAC - Introduction To Azure Cosmos DB
No ratings yet
9 - CT071-3-3-DDAC - Introduction To Azure Cosmos DB
30 pages
Chapter 17
No ratings yet
Chapter 17
20 pages
Developing Solutions For Microsoft Azure AZ 204 1726611181
No ratings yet
Developing Solutions For Microsoft Azure AZ 204 1726611181
74 pages
UNIX and Shell Scripting - Module 3
No ratings yet
UNIX and Shell Scripting - Module 3
13 pages
Lesson 0 - Course Introduction
No ratings yet
Lesson 0 - Course Introduction
6 pages
Amruta Academy Brochure - Artificial Intelligence
100% (1)
Amruta Academy Brochure - Artificial Intelligence
18 pages
GCP Official Icons and Solution Architectures PDF
No ratings yet
GCP Official Icons and Solution Architectures PDF
95 pages
AZ 203.v2019 12 01.71q
No ratings yet
AZ 203.v2019 12 01.71q
151 pages
Docker For Enterprise Operations
No ratings yet
Docker For Enterprise Operations
202 pages
Telnet What Is An SSL & TLS in Details End To End Encryption
No ratings yet
Telnet What Is An SSL & TLS in Details End To End Encryption
48 pages
Flask Cheatsheet PDF
No ratings yet
Flask Cheatsheet PDF
1 page
SAFe 4 Agilist Exam Study Guide (4.6)
No ratings yet
SAFe 4 Agilist Exam Study Guide (4.6)
14 pages
Testing in Python - Unit Test & Script
No ratings yet
Testing in Python - Unit Test & Script
5 pages
70 532 Exam Guide
100% (1)
70 532 Exam Guide
560 pages
Sustainable Web Development With Ruby On Rails P2.0
No ratings yet
Sustainable Web Development With Ruby On Rails P2.0
487 pages
Apache Airflow 1741977651
No ratings yet
Apache Airflow 1741977651
83 pages
Python 1
No ratings yet
Python 1
87 pages
Task 1 Database in SPSS
No ratings yet
Task 1 Database in SPSS
5 pages
Play The World Cup of Very Very Distant Football
No ratings yet
Play The World Cup of Very Very Distant Football
2 pages
Mass Communication Essay
No ratings yet
Mass Communication Essay
8 pages
Current International Accounting Standards
No ratings yet
Current International Accounting Standards
2 pages
Autopsy of A Snowflake
No ratings yet
Autopsy of A Snowflake
3 pages
Request For Assignment of Cuc To The SNCP
No ratings yet
Request For Assignment of Cuc To The SNCP
3 pages
Official Mexican Standard NOM
No ratings yet
Official Mexican Standard NOM
4 pages
Americanisms
100% (1)
Americanisms
10 pages
The Social Sciences and Their Field of Study
No ratings yet
The Social Sciences and Their Field of Study
5 pages
Resolved Text Commentary Rousseau
No ratings yet
Resolved Text Commentary Rousseau
2 pages
Moses Makosso: Year 2020 - 2021
No ratings yet
Moses Makosso: Year 2020 - 2021
8 pages
Control Systems of Provided Services
No ratings yet
Control Systems of Provided Services
3 pages
Internal Regulations For The Administration of Local Churches Acym
No ratings yet
Internal Regulations For The Administration of Local Churches Acym
20 pages
3) Practical Activity 1 Property Management and Rental Answers
No ratings yet
3) Practical Activity 1 Property Management and Rental Answers
2 pages
Quality Audit Report
No ratings yet
Quality Audit Report
20 pages
The Political System of Almond and Powell
No ratings yet
The Political System of Almond and Powell
9 pages
Format N
No ratings yet
Format N
3 pages
Definition of Conditional Sale - Robert
No ratings yet
Definition of Conditional Sale - Robert
12 pages
Thyroid Case Study
No ratings yet
Thyroid Case Study
9 pages
Model of Reengagement Procedure and Payment of Salaries and Other Benefits
No ratings yet
Model of Reengagement Procedure and Payment of Salaries and Other Benefits
4 pages
Isabel I CEO
No ratings yet
Isabel I CEO
7 pages
Letter To A Friend William Shakespeare
No ratings yet
Letter To A Friend William Shakespeare
2 pages
Medication Dispensing System
No ratings yet
Medication Dispensing System
11 pages
Steps For Business Formation
No ratings yet
Steps For Business Formation
20 pages
Buyer Motivation
No ratings yet
Buyer Motivation
1 page
r02 TheHistoryOfMiddleEarth
No ratings yet
r02 TheHistoryOfMiddleEarth
14 pages
Code of Ethics For Administrators
No ratings yet
Code of Ethics For Administrators
2 pages
The Six Pillars of Self-Esteem by Nathaniel Branden
100% (1)
The Six Pillars of Self-Esteem by Nathaniel Branden
13 pages
Anthropology in Agronomy
No ratings yet
Anthropology in Agronomy
2 pages
DISCOVERY KIDS in SPANISH LIVE ONLINE
No ratings yet
DISCOVERY KIDS in SPANISH LIVE ONLINE
3 pages
ActiveXperts Serial Port Component - Serial Port Tool For Visual Basic Developers
No ratings yet
ActiveXperts Serial Port Component - Serial Port Tool For Visual Basic Developers
49 pages
ARCore Document
No ratings yet
ARCore Document
3 pages
ER Exercises
75% (4)
ER Exercises
10 pages
Synopsis Omkar
No ratings yet
Synopsis Omkar
6 pages
Ns Tast GD 003
No ratings yet
Ns Tast GD 003
40 pages
Odin Commodity Client Patch
No ratings yet
Odin Commodity Client Patch
25 pages
Solution Smith Rule
No ratings yet
Solution Smith Rule
3 pages
Comparative Analysis Main Methods Business Process Modeling: Literature Review, Applications and Examples
No ratings yet
Comparative Analysis Main Methods Business Process Modeling: Literature Review, Applications and Examples
17 pages
System Administration Guide: SAP Adaptive Server Enterprise 16.0 SP02 Document Version: 1.3 - 2016-06-30
No ratings yet
System Administration Guide: SAP Adaptive Server Enterprise 16.0 SP02 Document Version: 1.3 - 2016-06-30
154 pages
Segment O and N
No ratings yet
Segment O and N
22 pages
Allama Iqbal Open University, Islamabad (Department of Computer Science) Warning
No ratings yet
Allama Iqbal Open University, Islamabad (Department of Computer Science) Warning
4 pages
Sanyam Kumar: Education
No ratings yet
Sanyam Kumar: Education
2 pages
Assembly Language Programming and Addressing Modes
No ratings yet
Assembly Language Programming and Addressing Modes
27 pages
PSG
100% (1)
PSG
62 pages
Fca - Exames
No ratings yet
Fca - Exames
11 pages
Git Notes For Professionals
No ratings yet
Git Notes For Professionals
195 pages
Chapter 7: Synchronization Examples: Silberschatz, Galvin and Gagne ©2018 Operating System Concepts - 10 Edition
No ratings yet
Chapter 7: Synchronization Examples: Silberschatz, Galvin and Gagne ©2018 Operating System Concepts - 10 Edition
45 pages
Revision Flashcards: Section 2 - Problem-Solving and Program Design
No ratings yet
Revision Flashcards: Section 2 - Problem-Solving and Program Design
2 pages
Operating System Interview Questions and Answers
100% (1)
Operating System Interview Questions and Answers
8 pages
ERP Buyers Guide
No ratings yet
ERP Buyers Guide
7 pages
BasicEpsonPrinterGettingStarted PDF
No ratings yet
BasicEpsonPrinterGettingStarted PDF
9 pages
ATN 910&910I&910B&950B V200R003C20 Routine Maintenance 03 (CLI)
No ratings yet
ATN 910&910I&910B&950B V200R003C20 Routine Maintenance 03 (CLI)
46 pages
Java Architect 4
No ratings yet
Java Architect 4
6 pages
ICT All Chapters Short Notes
100% (2)
ICT All Chapters Short Notes
282 pages
Wireless LAN Security
No ratings yet
Wireless LAN Security
20 pages
Aspire 4930 Quick Spec: Operating System Platform
No ratings yet
Aspire 4930 Quick Spec: Operating System Platform
5 pages
Web Testing
100% (2)
Web Testing
37 pages
Interfacing ARDUINO With C++
No ratings yet
Interfacing ARDUINO With C++
4 pages