0% found this document useful (0 votes)
80 views

Data Analytics and Visualization With Python 1728356869

Uploaded by

shehjaz.dev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

Data Analytics and Visualization With Python 1728356869

Uploaded by

shehjaz.dev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 121

Python for Accounting Applications

Data Analytics and Visualization


with Python
1121PAA08
ACC2, NTPU (M5265) (Fall 2023)
Wed 6, 7, 8, (14:10-17:00) (9:10-12:00) (B3F10)

Min-Yuh Day, Ph.D,


Associate Professor
Institute of Information Management, National Taipei University
https://web.ntpu.edu.tw/~myday
1
2023-11-08
Syllabus
Week Date Subject/Topics
1 2023/09/13 Introduction to Python for Accounting Applications
2 2023/09/20 Python Programming and Data Science
3 2023/09/27 Foundations of Python Programming
4 2023/10/04 Data Structures
5 2023/10/11 Control Logic and Loops
6 2023/10/18 Functions and Modules
7 2023/10/25 Files and Exception Handling
8 2023/11/01 Midterm Project Report
2
Syllabus
Week Date Subject/Topics
9 2023/11/08 Data Analytics and Visualization with Python
10 2023/11/15 Obtaining Data From the Web with Python
11 2023/11/22 Statistical Analysis with Python
12 2023/11/29 Machine Learning with Python
13 2023/12/06 Text Analytics with Python and
Large Language Models (LLMs)
14 2023/12/13 Applications of Accounting Data Analytics with Python
15 2023/12/20 Applications of ESG Data Analytics with Python
16 2023/12/27 Final Project Report
3
Data Analytics
and
Visualization
with Python
4
Outline
• NumPy
• Numerical Python N-dimensional array
• Pandas
• Data Analytics
• Matplotlib
• Basic Data Visualization
• Seaborn
• Advanced Visualization
5
Pandas: Data Analytics and Visualization

https://www.w3schools.com/python/pandas/default.asp 6
Pandas

https://www.w3schools.com/python/pandas/default.asp 7
Python
Programming
8
Top Programming Languages

https://spectrum.ieee.org/the-top-programming-languages-2023 9
Python is an
interpreted,
object-oriented,
high-level
programming language
with
dynamic semantics.
Source: https://www.python.org/doc/essays/blurb/ 10
Python Ecosystem for Data Science

Source: https://medium.com/pyfinance/why-python-is-best-choice-for-financial-data-modeling-in-2019-c0d0d1858c45 11
Python Ecosystem for Data Science

Source:https://duchesnay.github.io/pystatsml/introduction/python_ecosystem.html 12
The Quant Finance PyData Stack

Source: http://nbviewer.jupyter.org/format/slides/github/quantopian/pyfolio/blob/master/pyfolio/examples/overview_slides.ipynb#/5 13
Numpy
NumPy
Base
N-dimensional array
package
14
Python
matplotlib

Source: https://matplotlib.org/ 15
Python
Pandas

http://pandas.pydata.org/ 16
W3Schools Python

https://www.w3schools.com/python/ 17
W3Schools Python Numpy

https://www.w3schools.com/python/numpy/default.asp 18
W3Schools Python Pandas
Pandas Tutorial

https://www.w3schools.com/python/pandas/default.asp 19
W3Schools Python

https://www.w3schools.com/python/ 20
W3Schools Python: Try Python

https://www.w3schools.com/python/trypython.asp?filename=demo_default 21
LearnPython.org

https://www.learnpython.org/ 22
Google’s Python Class

https://developers.google.com/edu/python 23
Google Colab

https://colab.research.google.com/notebooks/welcome.ipynb 24
Connect Google Colab in Google Drive

25
Google Colab

26
Google Colab

27
Connect Colaboratory to Google Drive

28
Google Colab

29
Google Colab

30
Google Colab

31
Run Jupyter Notebook
Python3 GPU
Google Colab

32
Google Colab Python Hello World
print('Hello World')

33
Python in Google Colab (Python101)
https://colab.research.google.com/drive/1FEG6DnGvwfUbeo4zJ1zTunjMqf2RkCrT

https://tinyurl.com/aintpupython101 34
Source: https://www.python.org/community/logos/ 35
Python
Programming
36
Data Analytics
and
Visualization
with Python
37
Data Analytics and Visualization
with Python
• NumPy
• Numerical Python N-dimensional array
• Pandas
• Data Analytics
• Matplotlib
• Basic Data Visualization
• Seaborn
• Advanced Visualization 38
W3Schools Python Numpy

https://www.w3schools.com/python/numpy/default.asp 39
W3Schools Python Pandas
Pandas Tutorial

https://www.w3schools.com/python/pandas/default.asp 40
W3Schools Python

https://www.w3schools.com/python/ 41
Pandas: Data Analytics and Visualization

https://www.w3schools.com/python/pandas/default.asp 42
Wes McKinney (2022), "Python for Data Analysis: Data Wrangling with pandas, NumPy,
and Jupyter", 3rd Edition, O'Reilly Media.

https://github.com/wesm/pydata-book 43
Numpy
NumPy
Base
N-dimensional array
package
44
NumPy
is the
fundamental package
for
scientific computing
with Python.
Source: http://www.numpy.org/ 45
NumPy
NumPy
•NumPy provides a
multidimensional array object
to store homogenous or heterogeneous
data;
it also provides
optimized functions/methods to operate
on this array object.

Source: Yves Hilpisch (2014), Python for Finance: Analyze Big Financial Data, O'Reilly 46
NumPy ndarray
One-dimensional Array
NumPy

(1-D Array)
0 1 n-1

1 2 3 4 5

Two-dimensional Array
(2-D Array)
0 1 n-1
0 1 2 3 4 5
1 6 7 8 9 10
11 12 13 14 15
m-1 16 17 18 19 20
47
NumPy
NumPy

v = list(range(1, 6))
v
2 * v
import numpy as np
v = np.arange(1, 6)
v
2 * v
Source: Yves Hilpisch (2014), Python for Finance: Analyze Big Financial Data, O'Reilly 48
NumPy
Base
N-dimensional
array package

49
Python Data Structures
fruits = ["apple", "banana", "cherry"] #lists []
colors = ("red", "green", "blue") #tuples ()
animals = {'cat', 'dog'} #sets {}
person = {"name" : "Tom", "age" : 20} #dictionaries {}

https://tinyurl.com/aintpupython101 50
Lists []
x = [60, 70, 80, 90]
print(len(x)) 4
print(x[0]) 60
print(x[1]) 70
print(x[-1]) 90

51
NumPy
NumPy Create Array
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a * b
c

Source: Yves Hilpisch (2014), Python for Finance: Analyze Big Financial Data, O'Reilly 52
NumPy
NumPy

Source: http://cs231n.github.io/python-numpy-tutorial/ 53
import numpy as np
a = np.arange(15).reshape(3, 5)

a.shape
a.ndim
a.dtype.name

Source: https://docs.scipy.org/doc/numpy-dev/user/quickstart.html 54
Matrix

Source: https://simple.wikipedia.org/wiki/Matrix_(mathematics) 55
NumPy ndarray:
Multidimensional Array Object

56
NumPy ndarray
One-dimensional Array
(1-D Array)
0 1 n-1

1 2 3 4 5

Two-dimensional Array
(2-D Array)
0 1 n-1
0 1 2 3 4 5
1 6 7 8 9 10
11 12 13 14 15
m-1 16 17 18 19 20
57
import numpy as np
a = np.array([1,2,3,4,5])
One-dimensional Array
(1-D Array)
0 1 n-1

1 2 3 4 5

58
a = np.array([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20]])

Two-dimensional Array
(2-D Array)
0 1 n-1
0 1 2 3 4 5
1 6 7 8 9 10
11 12 13 14 15
m-1 16 17 18 19 20

59
import numpy as np
a = np.array([[0, 1, 2, 3],
[10, 11, 12, 13],
[20, 21, 22, 23]])
a
0 1 2 3
10 11 12 13
20 21 22 23
60
a = np.array
([[0, 1, 2, 3], [10, 11, 12, 13], [20, 21, 22, 23]])

0 1 2 3
10 11 12 13
20 21 22 23
61
NumPy Basics:
Arrays and Vectorized
Computation

Source: https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/ch04.html 62
NumPy Array

Source: https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/ch04.html 63
Numpy Array

Source: https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/ch04.html 64
Tensor
•3
• a rank 0 tensor; this is a scalar with shape []
• [1. ,2., 3.]
• a rank 1 tensor; this is a vector with shape [3]
• [[1., 2., 3.], [4., 5., 6.]]
• a rank 2 tensor; a matrix with shape [2, 3]
• [[[1., 2., 3.]], [[7., 8., 9.]]]
• a rank 3 tensor with shape [2, 1, 3]

https://www.tensorflow.org/ 65
Scalar 80

Vector [50 60 70]

Matrix 50 60 70
55 65 75

[50 60 70] [70 80 90]


Tensor [55 65 75] [75 85 95]
66
pandas
Python Data Analysis
Library
providing high-performance, easy-to-use
data structures and data analysis tools
for the Python programming language.
Source: http://pandas.pydata.org/ 67
pandas:
powerful Python data analysis toolkit
• Tabular data with
heterogeneously-typed columns,
as in an SQL table or Excel spreadsheet
• Ordered and unordered (not necessarily fixed-frequency) time
series data.
• Arbitrary matrix data (homogeneously typed or
heterogeneous) with row and column labels
• Any other form of observational / statistical data sets. The data
actually need not be labeled at all to be placed into a pandas
data structure
Source: http://pandas.pydata.org/pandas-docs/stable/ 68
Series
DataFrame
•Primary data structures of pandas
• Series (1-dimensional)
• DataFrame (2-dimensional)
•Handle the vast majority of typical use cases in
finance, statistics, social science, and many
areas of engineering.

Source: http://pandas.pydata.org/pandas-docs/stable/ 69
pandas DataFrame
•DataFrame provides everything that R’s
data.frame provides and much more.
•pandas is built on top of NumPy and is intended
to integrate well within a scientific computing
environment with many other 3rd party
libraries.

70
pandas
Comparison with SAS
pandas SAS
DataFrame data set
column variable
row observation
groupby BY-group
NaN .
Source: http://pandas.pydata.org/pandas-docs/stable/comparison_with_sas.html 71
Python Pandas Cheat Sheet

Source: https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf
72
Creating pd.DataFrame
a b c
1 4 7 10
2 5 8 11
3 6 9 12

import pandas as pd
df = pd.DataFrame({"a": [4, 5, 6],
"b": [7, 8, 9],
"c": [10, 11, 12]},
index = [1, 2, 3])
Source: https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf
73
Pandas DataFrame

type(df)

74
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
print('pandas imported')

s = pd.Series([1,3,5,np.nan,6,8])
s

dates = pd.date_range('20181001',
periods=6)
dates
Source: http://pandas.pydata.org/pandas-docs/stable/10min.html 75
76
df = pd.DataFrame(np.random.randn(6,4),
index=dates, columns=list('ABCD'))
df

77
df = pd.DataFrame(np.random.randn(3,5),
index=['student1','student2','student3']
, columns=list('ABCDE'))
df

78
df2 = pd.DataFrame({ 'A' : 1.,
'B' : pd.Timestamp('20181001'),
'C' : pd.Series(2.5,index=list(range(4)),dtype='float32'),
'D' : np.array([3] * 4,dtype='int32'),
'E' : pd.Categorical(["test","train","test","train"]),
'F' : 'foo' })
df2

79
df2.dtypes

80
Python Accounting Application with Pandas
import pandas as pd

# Create a DataFrame to store transactions


columns = ['Date', 'Description', 'Amount']
ledger = pd.DataFrame(columns=columns)

# Function to add a transaction


def add_transaction(date, description, amount):
global ledger
new_transaction = pd.DataFrame([[date, description, amount]], columns=columns) Date Description Amount
ledger = pd.concat([ledger, new_transaction], ignore_index=True) 0 2023-11-01 Income 1000
# Function to view the ledger 1 2023-11-02 Groceries -200
def view_ledger(): 2 2023-11-03 Utilities -100
print(ledger)
Current Balance: 700
# Function to get the current balance
def get_balance():
return ledger['Amount'].sum()

# Adding sample transactions


add_transaction('2023-11-01', 'Income', 1000)
add_transaction('2023-11-02', 'Groceries', -200)
add_transaction('2023-11-03', 'Utilities', -100)

# Viewing the ledger


view_ledger()

# Checking the current balance


print("Current Balance:", get_balance())

https://tinyurl.com/aintpupython101 81
Python Data Analysis and Visualization

Altair
82
Python
Pandas

http://pandas.pydata.org/ 83
Python
matplotlib

Source: https://matplotlib.org/ 84
Python
seaborn

Source: https://seaborn.pydata.org/ 85
Python
plotly

Source: https://plotly.com/python/ 86
Python
bokeh

Source: https://bokeh.org/ 87
Python
Altair
Altair
Source: https://altair-viz.github.io/ 88
Python matplotlib

https://matplotlib.org/ 89
Python Seaborn

https://seaborn.pydata.org/ 90
Python Plotly Graphing Library

https://plotly.com/python/ 91
Python Plotly Graphing Library

https://plotly.com/python/ 92
Python Plotly Graphing Library

https://plotly.com/python/ 93
Python Plotly Graphing Library

https://plotly.com/python/ 94
Python Plotly Graphing Library

https://plotly.com/python/ 95
Python Plotly Graphing Library

https://plotly.com/python/ 96
Python Bokeh

https://bokeh.org/ 97
Python Altair

https://altair-viz.github.io/ 98
Iris flower data set
setosa versicolor virginica

Source: https://en.wikipedia.org/wiki/Iris_flower_data_set
Source: http://suruchifialoke.com/2016-10-13-machine-learning-tutorial-iris-classification/ 99
Iris Classfication

Source: http://suruchifialoke.com/2016-10-13-machine-learning-tutorial-iris-classification/
100
iris.data
https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data
5.1,3.5,1.4,0.2,Iris-setosa setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.4,3.7,1.5,0.2,Iris-setosa virginica
4.8,3.4,1.6,0.2,Iris-setosa
4.8,3.0,1.4,0.1,Iris-setosa
4.3,3.0,1.1,0.1,Iris-setosa
5.8,4.0,1.2,0.2,Iris-setosa
5.7,4.4,1.5,0.4,Iris-setosa
5.4,3.9,1.3,0.4,Iris-setosa
5.1,3.5,1.4,0.3,Iris-setosa
5.7,3.8,1.7,0.3,Iris-setosa versicolor
5.1,3.8,1.5,0.3,Iris-setosa
5.4,3.4,1.7,0.2,Iris-setosa
5.1,3.7,1.5,0.4,Iris-setosa
4.6,3.6,1.0,0.2,Iris-setosa
5.1,3.3,1.7,0.5,Iris-setosa
4.8,3.4,1.9,0.2,Iris-setosa
5.0,3.0,1.6,0.2,Iris-setosa 101
Iris Data Visualization

Source: https://seaborn.pydata.org/generated/seaborn.pairplot.html 102


Data Visualization in Google Colab

https://tinyurl.com/aintpupython101 103
import seaborn as sns
sns.set(style="ticks", color_codes=True)
iris = sns.load_dataset("iris")
g = sns.pairplot(iris, hue="species")

Source: https://seaborn.pydata.org/generated/seaborn.pairplot.html 104


import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
from pandas.plotting import scatter_matrix

105
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
df = pd.read_csv(url, names=names)
print(df.head(10))

106
df.tail(10)

107
df.describe()

108
print(df.info())
print(df.shape)

109
df.groupby('class').size()

110
plt.rcParams["figure.figsize"] = (10,8)
df.plot(kind='box', subplots=True, layout=(2,2), sharex=False, sharey=False)
plt.show()

111
df.hist()
plt.show()

112
scatter_matrix(df)
plt.show()

113
sns.pairplot(df, hue="class", size=2)

114
Wes McKinney (2022), "Python for Data Analysis: Data Wrangling with pandas, NumPy,
and Jupyter", 3rd Edition, O'Reilly Media.

https://github.com/wesm/pydata-book 115
Wes McKinney (2022), "Python for Data Analysis: Data Wrangling with pandas, NumPy,
and Jupyter", 3rd Edition, O'Reilly Media.

Source: https://github.com/wesm/pydata-book/blob/3rd-edition/ch04.ipynb 116


Python in Google Colab (Python101)
https://colab.research.google.com/drive/1FEG6DnGvwfUbeo4zJ1zTunjMqf2RkCrT

https://tinyurl.com/aintpupython101 117
Python in Google Colab (Python101)
https://colab.research.google.com/drive/1FEG6DnGvwfUbeo4zJ1zTunjMqf2RkCrT

https://tinyurl.com/aintpupython101 118
Papers with Code
State-of-the-Art (SOTA)

https://paperswithcode.com/sota 119
Summary
• NumPy
• Numerical Python N-dimensional array
• Pandas
• Data Analytics
• Matplotlib
• Basic Data Visualization
• Seaborn
• Advanced Visualization
120
References
• Wes McKinney (2022), "Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter", 3rd Edition, O'Reilly Media.
• Aurélien Géron (2023), Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems,
3rd Edition, O’Reilly Media.
• Steven D'Ascoli (2022), Artificial Intelligence and Deep Learning with Python: Every Line of Code Explained For Readers New to AI and New to Python,
Independently published.
• Stuart Russell and Peter Norvig (2020), Artificial Intelligence: A Modern Approach, 4th Edition, Pearson.
• Varun Grover, Roger HL Chiang, Ting-Peng Liang, and Dongsong Zhang (2018), "Creating Strategic Business Value from Big Data Analytics: A Research
Framework", Journal of Management Information Systems, 35, no. 2, pp. 388-423.
• Junliang Wang, Chuqiao Xu, Jie Zhang, and Ray Zhong (2022). "Big data analytics for intelligent manufacturing systems: A review." Journal of
Manufacturing Systems 62 (2022): 738-752.
• Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition,
Pearson
• Python Programming, https://pythonprogramming.net/
• Python, https://www.python.org/
• Python Programming Language, http://pythonprogramminglanguage.com/
• Numpy, http://www.numpy.org/
• Pandas, http://pandas.pydata.org/
• Skikit-learn, http://scikit-learn.org/
• W3Schools Python, https://www.w3schools.com/python/
• Learn Python, https://www.learnpython.org/
• Google’s Python Class, https://developers.google.com/edu/python
• Min-Yuh Day (2023), Python 101, https://tinyurl.com/aintpupython101

121

You might also like