Data Analytics and Visualization With Python 1728356869
Data Analytics and Visualization With Python 1728356869
https://www.w3schools.com/python/pandas/default.asp 6
Pandas
https://www.w3schools.com/python/pandas/default.asp 7
Python
Programming
8
Top Programming Languages
https://spectrum.ieee.org/the-top-programming-languages-2023 9
Python is an
interpreted,
object-oriented,
high-level
programming language
with
dynamic semantics.
Source: https://www.python.org/doc/essays/blurb/ 10
Python Ecosystem for Data Science
Source: https://medium.com/pyfinance/why-python-is-best-choice-for-financial-data-modeling-in-2019-c0d0d1858c45 11
Python Ecosystem for Data Science
Source:https://duchesnay.github.io/pystatsml/introduction/python_ecosystem.html 12
The Quant Finance PyData Stack
Source: http://nbviewer.jupyter.org/format/slides/github/quantopian/pyfolio/blob/master/pyfolio/examples/overview_slides.ipynb#/5 13
Numpy
NumPy
Base
N-dimensional array
package
14
Python
matplotlib
Source: https://matplotlib.org/ 15
Python
Pandas
http://pandas.pydata.org/ 16
W3Schools Python
https://www.w3schools.com/python/ 17
W3Schools Python Numpy
https://www.w3schools.com/python/numpy/default.asp 18
W3Schools Python Pandas
Pandas Tutorial
https://www.w3schools.com/python/pandas/default.asp 19
W3Schools Python
https://www.w3schools.com/python/ 20
W3Schools Python: Try Python
https://www.w3schools.com/python/trypython.asp?filename=demo_default 21
LearnPython.org
https://www.learnpython.org/ 22
Google’s Python Class
https://developers.google.com/edu/python 23
Google Colab
https://colab.research.google.com/notebooks/welcome.ipynb 24
Connect Google Colab in Google Drive
25
Google Colab
26
Google Colab
27
Connect Colaboratory to Google Drive
28
Google Colab
29
Google Colab
30
Google Colab
31
Run Jupyter Notebook
Python3 GPU
Google Colab
32
Google Colab Python Hello World
print('Hello World')
33
Python in Google Colab (Python101)
https://colab.research.google.com/drive/1FEG6DnGvwfUbeo4zJ1zTunjMqf2RkCrT
https://tinyurl.com/aintpupython101 34
Source: https://www.python.org/community/logos/ 35
Python
Programming
36
Data Analytics
and
Visualization
with Python
37
Data Analytics and Visualization
with Python
• NumPy
• Numerical Python N-dimensional array
• Pandas
• Data Analytics
• Matplotlib
• Basic Data Visualization
• Seaborn
• Advanced Visualization 38
W3Schools Python Numpy
https://www.w3schools.com/python/numpy/default.asp 39
W3Schools Python Pandas
Pandas Tutorial
https://www.w3schools.com/python/pandas/default.asp 40
W3Schools Python
https://www.w3schools.com/python/ 41
Pandas: Data Analytics and Visualization
https://www.w3schools.com/python/pandas/default.asp 42
Wes McKinney (2022), "Python for Data Analysis: Data Wrangling with pandas, NumPy,
and Jupyter", 3rd Edition, O'Reilly Media.
https://github.com/wesm/pydata-book 43
Numpy
NumPy
Base
N-dimensional array
package
44
NumPy
is the
fundamental package
for
scientific computing
with Python.
Source: http://www.numpy.org/ 45
NumPy
NumPy
•NumPy provides a
multidimensional array object
to store homogenous or heterogeneous
data;
it also provides
optimized functions/methods to operate
on this array object.
Source: Yves Hilpisch (2014), Python for Finance: Analyze Big Financial Data, O'Reilly 46
NumPy ndarray
One-dimensional Array
NumPy
(1-D Array)
0 1 n-1
1 2 3 4 5
Two-dimensional Array
(2-D Array)
0 1 n-1
0 1 2 3 4 5
1 6 7 8 9 10
11 12 13 14 15
m-1 16 17 18 19 20
47
NumPy
NumPy
v = list(range(1, 6))
v
2 * v
import numpy as np
v = np.arange(1, 6)
v
2 * v
Source: Yves Hilpisch (2014), Python for Finance: Analyze Big Financial Data, O'Reilly 48
NumPy
Base
N-dimensional
array package
49
Python Data Structures
fruits = ["apple", "banana", "cherry"] #lists []
colors = ("red", "green", "blue") #tuples ()
animals = {'cat', 'dog'} #sets {}
person = {"name" : "Tom", "age" : 20} #dictionaries {}
https://tinyurl.com/aintpupython101 50
Lists []
x = [60, 70, 80, 90]
print(len(x)) 4
print(x[0]) 60
print(x[1]) 70
print(x[-1]) 90
51
NumPy
NumPy Create Array
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a * b
c
Source: Yves Hilpisch (2014), Python for Finance: Analyze Big Financial Data, O'Reilly 52
NumPy
NumPy
Source: http://cs231n.github.io/python-numpy-tutorial/ 53
import numpy as np
a = np.arange(15).reshape(3, 5)
a.shape
a.ndim
a.dtype.name
Source: https://docs.scipy.org/doc/numpy-dev/user/quickstart.html 54
Matrix
Source: https://simple.wikipedia.org/wiki/Matrix_(mathematics) 55
NumPy ndarray:
Multidimensional Array Object
56
NumPy ndarray
One-dimensional Array
(1-D Array)
0 1 n-1
1 2 3 4 5
Two-dimensional Array
(2-D Array)
0 1 n-1
0 1 2 3 4 5
1 6 7 8 9 10
11 12 13 14 15
m-1 16 17 18 19 20
57
import numpy as np
a = np.array([1,2,3,4,5])
One-dimensional Array
(1-D Array)
0 1 n-1
1 2 3 4 5
58
a = np.array([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20]])
Two-dimensional Array
(2-D Array)
0 1 n-1
0 1 2 3 4 5
1 6 7 8 9 10
11 12 13 14 15
m-1 16 17 18 19 20
59
import numpy as np
a = np.array([[0, 1, 2, 3],
[10, 11, 12, 13],
[20, 21, 22, 23]])
a
0 1 2 3
10 11 12 13
20 21 22 23
60
a = np.array
([[0, 1, 2, 3], [10, 11, 12, 13], [20, 21, 22, 23]])
0 1 2 3
10 11 12 13
20 21 22 23
61
NumPy Basics:
Arrays and Vectorized
Computation
Source: https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/ch04.html 62
NumPy Array
Source: https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/ch04.html 63
Numpy Array
Source: https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/ch04.html 64
Tensor
•3
• a rank 0 tensor; this is a scalar with shape []
• [1. ,2., 3.]
• a rank 1 tensor; this is a vector with shape [3]
• [[1., 2., 3.], [4., 5., 6.]]
• a rank 2 tensor; a matrix with shape [2, 3]
• [[[1., 2., 3.]], [[7., 8., 9.]]]
• a rank 3 tensor with shape [2, 1, 3]
https://www.tensorflow.org/ 65
Scalar 80
Matrix 50 60 70
55 65 75
Source: http://pandas.pydata.org/pandas-docs/stable/ 69
pandas DataFrame
•DataFrame provides everything that R’s
data.frame provides and much more.
•pandas is built on top of NumPy and is intended
to integrate well within a scientific computing
environment with many other 3rd party
libraries.
70
pandas
Comparison with SAS
pandas SAS
DataFrame data set
column variable
row observation
groupby BY-group
NaN .
Source: http://pandas.pydata.org/pandas-docs/stable/comparison_with_sas.html 71
Python Pandas Cheat Sheet
Source: https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf
72
Creating pd.DataFrame
a b c
1 4 7 10
2 5 8 11
3 6 9 12
import pandas as pd
df = pd.DataFrame({"a": [4, 5, 6],
"b": [7, 8, 9],
"c": [10, 11, 12]},
index = [1, 2, 3])
Source: https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf
73
Pandas DataFrame
type(df)
74
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
print('pandas imported')
s = pd.Series([1,3,5,np.nan,6,8])
s
dates = pd.date_range('20181001',
periods=6)
dates
Source: http://pandas.pydata.org/pandas-docs/stable/10min.html 75
76
df = pd.DataFrame(np.random.randn(6,4),
index=dates, columns=list('ABCD'))
df
77
df = pd.DataFrame(np.random.randn(3,5),
index=['student1','student2','student3']
, columns=list('ABCDE'))
df
78
df2 = pd.DataFrame({ 'A' : 1.,
'B' : pd.Timestamp('20181001'),
'C' : pd.Series(2.5,index=list(range(4)),dtype='float32'),
'D' : np.array([3] * 4,dtype='int32'),
'E' : pd.Categorical(["test","train","test","train"]),
'F' : 'foo' })
df2
79
df2.dtypes
80
Python Accounting Application with Pandas
import pandas as pd
https://tinyurl.com/aintpupython101 81
Python Data Analysis and Visualization
Altair
82
Python
Pandas
http://pandas.pydata.org/ 83
Python
matplotlib
Source: https://matplotlib.org/ 84
Python
seaborn
Source: https://seaborn.pydata.org/ 85
Python
plotly
Source: https://plotly.com/python/ 86
Python
bokeh
Source: https://bokeh.org/ 87
Python
Altair
Altair
Source: https://altair-viz.github.io/ 88
Python matplotlib
https://matplotlib.org/ 89
Python Seaborn
https://seaborn.pydata.org/ 90
Python Plotly Graphing Library
https://plotly.com/python/ 91
Python Plotly Graphing Library
https://plotly.com/python/ 92
Python Plotly Graphing Library
https://plotly.com/python/ 93
Python Plotly Graphing Library
https://plotly.com/python/ 94
Python Plotly Graphing Library
https://plotly.com/python/ 95
Python Plotly Graphing Library
https://plotly.com/python/ 96
Python Bokeh
https://bokeh.org/ 97
Python Altair
https://altair-viz.github.io/ 98
Iris flower data set
setosa versicolor virginica
Source: https://en.wikipedia.org/wiki/Iris_flower_data_set
Source: http://suruchifialoke.com/2016-10-13-machine-learning-tutorial-iris-classification/ 99
Iris Classfication
Source: http://suruchifialoke.com/2016-10-13-machine-learning-tutorial-iris-classification/
100
iris.data
https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data
5.1,3.5,1.4,0.2,Iris-setosa setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.4,3.7,1.5,0.2,Iris-setosa virginica
4.8,3.4,1.6,0.2,Iris-setosa
4.8,3.0,1.4,0.1,Iris-setosa
4.3,3.0,1.1,0.1,Iris-setosa
5.8,4.0,1.2,0.2,Iris-setosa
5.7,4.4,1.5,0.4,Iris-setosa
5.4,3.9,1.3,0.4,Iris-setosa
5.1,3.5,1.4,0.3,Iris-setosa
5.7,3.8,1.7,0.3,Iris-setosa versicolor
5.1,3.8,1.5,0.3,Iris-setosa
5.4,3.4,1.7,0.2,Iris-setosa
5.1,3.7,1.5,0.4,Iris-setosa
4.6,3.6,1.0,0.2,Iris-setosa
5.1,3.3,1.7,0.5,Iris-setosa
4.8,3.4,1.9,0.2,Iris-setosa
5.0,3.0,1.6,0.2,Iris-setosa 101
Iris Data Visualization
https://tinyurl.com/aintpupython101 103
import seaborn as sns
sns.set(style="ticks", color_codes=True)
iris = sns.load_dataset("iris")
g = sns.pairplot(iris, hue="species")
105
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
df = pd.read_csv(url, names=names)
print(df.head(10))
106
df.tail(10)
107
df.describe()
108
print(df.info())
print(df.shape)
109
df.groupby('class').size()
110
plt.rcParams["figure.figsize"] = (10,8)
df.plot(kind='box', subplots=True, layout=(2,2), sharex=False, sharey=False)
plt.show()
111
df.hist()
plt.show()
112
scatter_matrix(df)
plt.show()
113
sns.pairplot(df, hue="class", size=2)
114
Wes McKinney (2022), "Python for Data Analysis: Data Wrangling with pandas, NumPy,
and Jupyter", 3rd Edition, O'Reilly Media.
https://github.com/wesm/pydata-book 115
Wes McKinney (2022), "Python for Data Analysis: Data Wrangling with pandas, NumPy,
and Jupyter", 3rd Edition, O'Reilly Media.
https://tinyurl.com/aintpupython101 117
Python in Google Colab (Python101)
https://colab.research.google.com/drive/1FEG6DnGvwfUbeo4zJ1zTunjMqf2RkCrT
https://tinyurl.com/aintpupython101 118
Papers with Code
State-of-the-Art (SOTA)
https://paperswithcode.com/sota 119
Summary
• NumPy
• Numerical Python N-dimensional array
• Pandas
• Data Analytics
• Matplotlib
• Basic Data Visualization
• Seaborn
• Advanced Visualization
120
References
• Wes McKinney (2022), "Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter", 3rd Edition, O'Reilly Media.
• Aurélien Géron (2023), Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems,
3rd Edition, O’Reilly Media.
• Steven D'Ascoli (2022), Artificial Intelligence and Deep Learning with Python: Every Line of Code Explained For Readers New to AI and New to Python,
Independently published.
• Stuart Russell and Peter Norvig (2020), Artificial Intelligence: A Modern Approach, 4th Edition, Pearson.
• Varun Grover, Roger HL Chiang, Ting-Peng Liang, and Dongsong Zhang (2018), "Creating Strategic Business Value from Big Data Analytics: A Research
Framework", Journal of Management Information Systems, 35, no. 2, pp. 388-423.
• Junliang Wang, Chuqiao Xu, Jie Zhang, and Ray Zhong (2022). "Big data analytics for intelligent manufacturing systems: A review." Journal of
Manufacturing Systems 62 (2022): 738-752.
• Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition,
Pearson
• Python Programming, https://pythonprogramming.net/
• Python, https://www.python.org/
• Python Programming Language, http://pythonprogramminglanguage.com/
• Numpy, http://www.numpy.org/
• Pandas, http://pandas.pydata.org/
• Skikit-learn, http://scikit-learn.org/
• W3Schools Python, https://www.w3schools.com/python/
• Learn Python, https://www.learnpython.org/
• Google’s Python Class, https://developers.google.com/edu/python
• Min-Yuh Day (2023), Python 101, https://tinyurl.com/aintpupython101
121