Chapter 1,2,3
Chapter 1,2,3
Data Science
Artificial Intelligence
Machine Learning
Data Mining
Deep
Learning
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 1
Data
Data are characteristics or information, usually numerical, that are collected through observation
Variable
Wikipedia / Attribute/ Feature
Date/Time Building Code Power Consumption (kW) Heat Consumption (kW) Power Price ($/kW) Heat Price ($/kW)
Row / Example/ 1/1/21 0:00 6601 450 550 10 4
Sample 1/2/21 1:00 6602 480 590 12 5
1/3/21 2:00 6603 600 540 11 7
1/4/21 3:00 6604 670 596 12 3
1/5/21 4:00 6605 -26 523 10 4
1/6/21 5:00 6606 390 488 9 6
1/7/21 6:00 6607 430 610 14 6
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 1
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 1
Required Packages
NumPy
Pip Installation
Pandas
Anaconda Distribution (Conda Installation)
Matplotlib
Scikit-learn
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 2
NumPy
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 2
NumPy
1 3 3 4 18 25
@ =
4 5 5 7 37 51
1 3 3 4 3 12
∗ =
4 5 5 7 20 35
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 2
NumPy
3 4 3 3
+
5 7 3 3
1 2 3
+ [4 5 6]
3 4 5
1 2 3 4 5 6 5 7 9
+ =
3 4 5 4 5 6 7 9 11
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 2
NumPy
Normal Distribution
Uniform Distribution
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 2
Median 𝑥𝑥1 , 𝑥𝑥2 , 𝑥𝑥3 , 𝑥𝑥4 , 𝑥𝑥5 𝑥𝑥1 , 𝑥𝑥2 , 𝑥𝑥3 , 𝑥𝑥4
odd even
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 2
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 2
Matplotlib
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 2
Matplotlib
Year TI
2010 0.72
2011 0.61
2012 0.65
2013 0.68
2014 0.75
2015 0.90
2016 1.02
2017 0.93
2018 0.85
2019 0.99
2020 1.02
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 2
Matplotlib
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 2
Matplotlib
Reference: https://matplotlib.org/3.1.0/gallery/color/named_colors.html
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 2
Matplotlib
Reference: https://matplotlib.org/3.1.1/api/markers_api.html
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 2
Matplotlib
https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 3
Preprocessing
4, 8, 12, 21, 33, 58, 92, 98
Statistics
∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 4 + 8 + 12 + 21 + 33 + 58 + 92 + 98
𝑥𝑥̅ = 8
= 40.75
𝑛𝑛
∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ 2
4 − 40.75 2
+ 8 − 40.75 2
+ 12 − 40.75 2
+ 21 − 40.75 2
+ 33 − 40.75 2
+ 58 − 40.75 2
+ 92 − 40.75 2
+ 98 − 40.75 2
𝑉𝑉 = 8
𝑛𝑛
= 1237.7
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 3
Preprocessing
4, 5, 6, 8, 10, 11, 13, 14
Statistics
∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 4 + 5 + 6 + 8 + 10 + 11 + 13 + 14
𝑥𝑥̅ = = 8.87
𝑛𝑛 8
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 3
Y Y Y
X X X
𝐶𝐶𝐶𝐶𝐶𝐶 𝑋𝑋, 𝑌𝑌 < 0 𝐶𝐶𝐶𝐶𝐶𝐶 𝑋𝑋, 𝑌𝑌 > 0 𝐶𝐶𝐶𝐶𝐶𝐶 𝑋𝑋, 𝑌𝑌 = 0
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 3
Time E_Plug E_Heat Price Temperat No. Occup
Preprocessing 1 24 28 10 -15 12
2 17 32 12 -17 12
3 16 34 11 -19 12
Missing Values 3
4
16
16
34
33
11
12
-19
-18
12
12
5 16 30 10 -14 12
6 16 31 10 -16 12
7 19 28 14 -14 12
8 22 29 12 -15 9
NaN ≈ Not a Number 9 25 26 12 -12 8
10 26 24 14 -8 8
11 27 20 14 -4 8
12 30 19 16 0 5
Null ≈ No Value 13 30 19 16 0 4
14 NaN 13 17 2 4
15 27 14 17 3 4
16 27 16 17 2 6
17 28 -4 18 0 8
18 33 26 20 -6 9
19 42 32 2 -8 10
Python report null when the cell is 20 48 33 21 -12 12
empty while NaN could be used when 21 47 32 21 -16 12
22 44 30 22 -18 12
the cell is filled with something that
23 36 35 21 -19 12
doesn’t make sense 24 37 36 18 -22 12
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 3
Outlier
When a data is unusually smaller or bigger than other
Concept data, it could (Not Always) an outlier
Data
4
2
5
3
7
5
6
9 Not always the outlier, but it has
32 high potential to be an outlier
2
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 3
Outlier 𝟑𝟑𝟑𝟑
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 3
Preprocessing axis = 1
Concatenating
Col 3 Col 1 Col 2
Code1 4 25
Code2 2 36
Code3 5 55
Code4 3 69
Code5 7 99
Code6 5 65
Code7 6 51
Code8 9 21
Code9 32 58
axis = 0 Code10 2 22
Code11 56 12
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 3
P/OffP Peak OffPeak
OffPeak 0 1
Preprocessing OffPeak 0 1
OffPeak 0 1
Dummy Coding OffPeak
OffPeak
0
0
1
1
OffPeak 0 1
Peak 1 0
Peak 1 0
To Make Them
Peak 1 0
Understandable for
OffPeak 0 1
OffPeak
Machine 0 1
Categorical OffPeak 0 1 Dummy
We Need to Give Value,
OffPeak 0 1 Variable
Variable And Thats Why We Use
OffPeak 0 1
OffPeak Dummy Coding 0 1
OffPeak 0 1
Peak 1 0
Peak 1 0
Peak 1 0
Peak 1 0
Peak 1 0
OffPeak 0 1
OffPeak 0 1
OffPeak 0 1
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 3
Preprocessing
We are dealing with different ranges. But
Normalization although Col1 range is way less than Col2 range,
it could be as important as Col2 is (and maybe
more)
Col 1 Col 2
4 25 But Machine does not understand
2 36 this, and it just understand
5 55 numbers!!
3 69 2< 𝐶𝐶𝐶𝐶𝐶𝐶𝐶 < 9
7 99 21 < 𝐶𝐶𝐶𝐶𝐶𝐶𝐶 < 99 To solve this, we need to change the range of the
5 65 features to unique same range (For example
6 51 between 0-1)
9 21
32 58
2 22 Normalizing the Data
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi
Chapter 3
Preprocessing
Normalization Sklearn (Preprocessing Sub package)
𝒙𝒙 − 𝒙𝒙𝑴𝑴𝑴𝑴𝑴𝑴
𝒙𝒙𝑴𝑴𝑴𝑴𝑴𝑴 − 𝒙𝒙𝑴𝑴𝑴𝑴𝑴𝑴 L1 Norm L2 Norm
(Manhattan Distance) (Euclidean Distance)
𝒙𝒙𝒊𝒊𝒊𝒊 𝒙𝒙𝒊𝒊𝒊𝒊
𝒙𝒙𝟏𝟏 + 𝒙𝒙𝟐𝟐 + 𝒙𝒙𝟑𝟑 +⋯+|𝒙𝒙𝒏𝒏 |
𝒙𝒙𝟐𝟐𝟏𝟏 +𝒙𝒙𝟐𝟐𝟐𝟐 +𝒙𝒙𝟐𝟐𝟑𝟑 …+𝒙𝒙𝟐𝟐𝒏𝒏
Data Science & Machine Learning A-Z: Hands on Python Instructor: Navid Shirzadi