Play with Python: An intro to Data
Science
Ignacio Larrú Instituto de Empresa
Who am I?
• Passionate about Technology
From Iphone apps to algorithmic programming I
love innovative technology
• Former Entrepreneur:
Founded several companies from online memorabilia
e-commerce to structural civil engineering
calculations.
• Investment Banker:
I advise Spanish companies in M&A and IPO
processes
• Venture Capital & Bootcamp:
CFO, investment director at K Fund + Academic
Director of IE Data Science Bootcamp
2
Big Data and Data Science
Big Data technologies
Data Science
Why is Data Science so difficult?
Overview of the Data Science process
Validation!!!!
Framing the Problem Solving the Problem Action!!!
Problem recognition
• Business comes first, think on what moves the needle
• Focus specific on decisions that will be made as a result of the analysis
– Helps everyone realize the reason for the analysis
– Makes identifying key stakeholders easier
– No decision …. No analytics?
• Plan your objective for your problem:
– Investigation
– Exploration
– A/B Testing
– Survey
– Prediction
– Past performance ( reporting)
• Scope of the problem should be expansive … but by the end of the
problem framing you should have a clear statement of the problem
Exploratory Data Analysis
• Use descriptive statistics (median, mode, variance,
frequency tables, correlations lines, etc…) to
understand the important characteristics of a dataset
•Identify trends and outliers
Overview of Data processing algorithms (i)
1. Classification -> for each individual in a population, which of
a set of clasess this individual belongs to.
• Among all the customers of ACME, which are likely to
respond to a given offer?
2. Regression -> Estimate or predict, for each individual, the
numerical value of some variable for that individual
• How much will a customer use the service?
Overview of Data processing algorithms (ii)
3. Similarity matching -> identify similar indivduals based on
data known about them
• Other customers also bought…
4. Clustering -> Group individuals in a population together by
their similarity but not driven by nay specific purpose
• Do our customers form natural groups or segments?
5. Co – occurence ->Find associations between entitites
based on transactions involving them
• What items are commonly purchased together?
Can data visualizations hurt your analysis?
Can data visualizations hurt your analysis?
Can data visualizations hurt your analysis?
Can data visualizations hurt your analysis?
Lying with graphs
Source: Hbr.com
Lying with graphs
Source: Hbr.com
Python Data Science Stack
Hello World!
print(“hello World”)
17
Python is interpreted
18
Programming Python
19
Comments in Python
# for a single line comment
‘’’…’’’ for a multiple line comment
20
Variables in Python
21
Variables in Python
• Don’t need to have a pre-defined type, they get the type from
the value they are pointing at
•Four main types:
• String – holds text based values
• name = “Ignacio”
•int – Integer numbers
•name = 10
•float – floating decimal numbers
• name = 10.4
•Boolean (True or False)
•More variable types (lists, tuples, dictionaries to be reviewed
during the course)
22
Variables in Python
• A variable has
•a name (identifier)
•a type
• a scope
•and …a value
•A valid identifier is a non-empty sequence of characters with:
• The start character can be the underscore "_" or a capital
or lower case letter
• The letters following the start character can be anything
which is permitted as a start character plus the digits
• Identifiers are case-sensitive!
• Python keywords are not allowed as identifier names!
23
Python vs. other languages
Python Statically typed languages
•Variable type determined at runtime •Bound to a type at compile type
•Variable bound to one object and the •Bound to an object at runtime
object has only one type
•Need to declare the variable before
•Varible can change type by changing using it
the type of the object bound to the
variable
24
Python is a dynamically typed language
25
Python is a strongly typed language
26
With every great power…
•Guidelines
•Use descriptive names (x vs. sales_amount)
•Be consistent (user_name or userName?)
•Follow the traditions of the language
•Usually in Python variable names start with a
lowercase letter and avoid starting with an underscore
•Keep the length in check
•no user_total_sales_month_report
27
Mathematical operators
Source:http://www.emcu.it/
28
Mathematical operators
29
Converting values
• float(x) - returns a floating-point value by
converting x
• int(x) - returns an integer value by converting x
• str(x) - returns a string value by converting x
•bool(x) – returns a boolean value
30
If – else - elif
31
If – else - elif
32
Logical operators
33
While Loop
34
While and if
35
For Loops
36
Range() Function
range([start], stop[, step])
37
Break, Continue and pass … with else
• Break -> End loop
•Continue -> End operation
•Pass -> Null statement used as placeholder
•Else at the end of loops:
•For -> ended normally the loop ( no break)
•While -> The loop condition is false
38
Break, Continue and pass … with else
39
Python simple data structures
40
Sequences
41
Strings are sequences
42
Using len() and in
• len() function will return the length of a sequence
•The in operator checks if an element is a member of a
sequence
•If the element is a member the condition is true else it is
false
43
Using len() and in
44
Programming exercise
45
Slicing Sequences
46
Programming exercise
47
Lists – Mutables sequences
48
Lists – Adding new items
• append adds at the end of the list
•Insert(index,value) allows you to insert at a given index
49
Lists – Remove
• remove(value)
•del function
50
Lists – Remove items
51
Sort() vs. Sorted()
52
Tuples
• Inmutable secuences that can
contain elements of different types
that can be mutable
• If the contents not need to
change used tuples vs. lists
• Faster than lists
53
Tuples are inmutable… but not its elements
54
Sets – Non duplicative unordered collections
55
Sets – Operations
56
Sets – Math Operations
57
Sets – Math Operations
58
Dictionaries
59
Dictionaries
60
Dictionaries - operations
• update(d) to join dictionaries (or {**x,**y})
•copy() to creatr a shallow copy
•get(“key”) returns None if element doesn’t exist
61
Dictionaries - operations
62
Crossfit coding
63
Session Wrap-up
64