3-numpy_pandas
3-numpy_pandas
CRASH COURSE
CS 334: Machine Learning
COURSE REMINDERS
• Python workshops running for the first 3 weeks with M/W offerings
2
WORKING WITH OTHER LIBRARIES
• Import the module / library of interest
• Example:
https://nustat.github.io/DataScience_Intro_python/NumPy.html 6
CODING EXAMPLE
• Matrix
• x2[1:4, ::2]
• x2[:, 1]
NUMPY ARITHMETIC OPERATORS
10
NUMPY AGGREGATION FUNCTIONS
11
NUMPY MATRIX OPERATIONS
• x2@x1
14
PANDAS: SERIES + DATAFRAME
• Series is a one-dimensional
object with sequence of values
• DataFrame is a two-
dimensional object (often
thought of as tabular data)
https://nustat.github.io/DataScience_Intro_python/Pandas.html 15
PANDAS READING FROM FILE
• Example:
import pandas as pd
foo = pd.read_csv(“foo.csv”)
PANDAS ATTRIBUTES
17
PANDAS METHODS
18
CODING EXAMPLE
20
PANDAS INDEXING
• loc – Axis labels (row labels and column names) to subset data
• foo.loc[0:1, “x1”]
• foo.iloc[0:1, 0]
Main difference is loc uses the
foo.iloc[3:5, [0,2]]
•
names (this can be integers) and
iloc must be integers!
21
PANDAS SUBSETTING
• foo.loc[foo[“y”] == 0, :]
• foo.loc[foo[“x1”] < 1, :]
• foo.iloc[foo[”x1”].argmax(), :]
• foo.iloc[foo[”x2”].argmin(), :]
22
CODING EXAMPLE
• Given iris dataframe
Many other libraries available, see this article for top 10:
https://www.projectpro.io/article/python-data-visualization-libraries/543
24
COMPONENTS OF MATPLOTLIB
• Axes translates to
individual plot/graph
https://matplotlib.org/stable/gallery/showcase/anatomy.html 25
MATPLOTLIB: USEFUL METHODS
26
MATPLOTLIB SCATTERPLOT
• Use pyplot module to make plots
28
MATPLOTLIB BOXPLOT
https://matplotlib.org/stable/gallery/statistics/boxplot_demo.html 29
MATPLOTLIB FOO BOXPLOT
• import matplotlib.pyplot as plt
fig, ax = plt.subplots()
data = [foo.loc[foo['y'] == 0, 'x1'], foo.loc[foo['y'] == 1, 'x1']]
ax.boxplot(data)
ax.set_title('Boxplot of x1 distribution based on y')
ax.set_xticklabels([0, 1])
ax.set_xlabel("y")
ax.set_ylabel("x1")
# if in terminal need to show
plt.show()
30
PANDAS PLOTTING
• Provides a mechanism to generate plots directly from dataframes
• box()
• scatter()
• hist()
31
PANDAS FOO SCATTERPLOT
• colors = {0: 'orange', 1: 'purple’}
color_list = [colors[group] for group in foo['y’]]
# Create a scatter plot with color-coding based on
'categorical_variable’
ax = foo.plot.scatter('x1', 'x2', c=color_list)
# Create legend handles, labels for each group and add legend to the
plot
import matplotlib.patches as mpatches
legend_handles = [
mpatches.Patch(color=colors[0], label=0),
mpatches.Patch(color=colors[1], label=1)]
ax.legend(handles=legend_handles, loc='upper left’)
ax.set_xlabel("x1")
ax.set_ylabel("x2")
ax.set_title("Foo x1 vs x2 scatterplot")
plt.show()
32
PANDAS FOO BOXPLOT
• foo.boxplot(column='x1', by='y')
# if in terminal need to show
plt.show()
33
SEABORN PLOTTING
• boxplot()
• scatterplot()
34
SEABORN FOO SCATTERPLOT
35
SEABORN FOO BOXPLOT
36
CODING EXAMPLE
• Given iris dataframe