0% found this document useful (0 votes)

8 views37 pages

3-numpy_pandas

This document is a crash course on NumPy, Pandas, and data visualization techniques for a Machine Learning course (CS 334). It covers essential concepts such as array creation, indexing, arithmetic operations in NumPy, as well as data manipulation and statistical functions in Pandas, along with visualization methods using Matplotlib and Seaborn. Additionally, it includes coding examples and exercises to reinforce learning.

Uploaded by

hokumura032

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views37 pages

3-numpy_pandas

Uploaded by

hokumura032

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

NUMPY/PANDAS/VISUALIZATION

CRASH COURSE
CS 334: Machine Learning
COURSE REMINDERS

• In-Class exercise #1 due 1/31

• Read the syllabus

• Homework #1 out and due 2/6

• Python workshops running for the first 3 weeks with M/W offerings

2
WORKING WITH OTHER LIBRARIES
• Import the module / library of interest

• import <library> as <something>

• from <library> import <some function or

class>

• Example:

• from exercise2 import sum_list

NUMPY

• NumPy (short for Numerical Python) provides efficient

implementation to store and operate on dense numeric data

• Useful for logical and mathematical calculations on arrays and

matrices

• Much faster than lists

• Less memory than lists

4
NUMPY ARRAY AND MATRICES

• Numpy arrays and matrices

must be the same data Code Print(a)
type! a = np.array([3.14, 4, 2, 3]) array([3.14, 4. , 2. , 3. ])
a = np.array([[3.14, 4], [2, 3]]) array([[3.14, 4. ],
[2. , 3. ]])
• Multiple ways to create
a = np.zeros((2,2)) array([[0., 0.],
arrays and matrices [0., 0.]])
a = np.ones(3) array([1., 1., 1.])
• Common import:
import numpy as np
5
ARRAY ATTRIBUTES
• ndim - Number of array dimensions

• shape - Tuple of array dimensions

• size - Number of elements in the array

• dtype – Array element types

https://nustat.github.io/DataScience_Intro_python/NumPy.html 6
CODING EXAMPLE

• Try ndim, shape, size, dtype on the

following arrays (x1-x3):
import numpy as np
rng = np.random.default_rng(334)
x1 = rng.random(4)
x2 = rng.integers(0, 20, size=(5,4))
x3 = rng.random((5,4,3))
x4 = rng.integers(0, 10, size=(5,4))

Exercise #3 involves questions

related to this
NUMPY ARRAY INDEXING
• Array

• Indexing specific element: a[5]

• Indexing range of elements: a[start:stop:step], unspecified

values default to start=0, stop=size of dimension, step=1

• Matrix

• Indexing dimensions is separated by comma (e.g., a[5, 4])

8
CODING EXAMPLE

• For x2 from above, what do the

following lines do?
• x2[:2, :]

• x2[1:4, ::2]

• x2[:, 1]
NUMPY ARITHMETIC OPERATORS

Operator Python Python Example

Addition + x1+x5
Minus - x1-x5
Division / x1 / x5
Multiplication * x1 * x5
Modulo % x1 % x5

10
NUMPY AGGREGATION FUNCTIONS

Operator Numpy Function NaN-safe version

Sum of elements np.sum np.nansum
Product of elements np.prod np.nanprod
Mean of elements np.mean np.nanmean
Max of elements np.max np.nanmax
Min of elements np.min np.nanmin
Index of max value np.argmax np.nanargmax
Index of min value np.argmin np.nanargmin
All elements are true np.all -
Any element is true np.any -

11
NUMPY MATRIX OPERATIONS

Operator Numpy Function

Transpose a.T
Matrix multiplication a@b or np.matmul(a,b)
Determinant of array np.linalg.det(a)
Inverse of square matrix np.linagl.inv(a)

There are many more useful methods! Read the numpy

documentation if you want some mathematical function.
12
CODING EXAMPLE

• For x1, x2, and x4 from above, what

do the following lines do?
• X2*x4

• x2@x1

Exercise #3 involves questions for

this example
PANDAS

• Library for computation with tabular data

• Mixed types of data allowed in a single table

• Columns and rows of data can be named

• Advanced data aggregation and statistical functions

14
PANDAS: SERIES + DATAFRAME

• Series is a one-dimensional
object with sequence of values

• DataFrame is a two-
dimensional object (often
thought of as tabular data)

https://nustat.github.io/DataScience_Intro_python/Pandas.html 15
PANDAS READING FROM FILE

• Comma-separated files are common file formats

for data

• Given the filepath, you can read a csv file using

the command read_csv

• Example:
import pandas as pd
foo = pd.read_csv(“foo.csv”)
PANDAS ATTRIBUTES

• columns – Column labels of dataframe

• index – Row labels or indices

• shape – Number of rows and columns in dataframe

• size - Number of elements in the dataframe

• dtypes – Data types of the columns

17
PANDAS METHODS

• head(n) – Prints the first n rows

• tail(n) – Prints the last n rows

• describe() – Summary statistics of the dataframe

18
CODING EXAMPLE

• Read in the file iris.csv (Exercises -> Ex 3)

• How many rows and columns are there?

• Print the first 5 rows

• What are the summary statistics of the

dataframe?
PANDAS STATISTICAL CALCULATIONS
Operator pandas Function
Sum of series a.sum()
Product of series a.prod()
Mean of series a.mean()
Median of series a.median()
Standard Deviation a.std()
Max of series a.max()
Min of series a.min()
Index of max of series a.argmax()
Index of min of series a.argmin()

20
PANDAS INDEXING
• loc – Axis labels (row labels and column names) to subset data

• foo.loc[0:1, “x1”]

• foo.loc[3:5, ["x1", "y"]]

• iloc – Use position of rows and columns (i.e., integers)

• foo.iloc[0:1, 0]
Main difference is loc uses the
foo.iloc[3:5, [0,2]]
•
names (this can be integers) and
iloc must be integers!
21
PANDAS SUBSETTING

• Use comparison to find samples with specific values in columns

• foo.loc[foo[“y”] == 0, :]

• foo.loc[foo[“x1”] < 1, :]

• Finding the samples with the largest/smallest value in a column

• foo.iloc[foo[”x1”].argmax(), :]

• foo.iloc[foo[”x2”].argmin(), :]

22
CODING EXAMPLE
• Given iris dataframe

• What is the median of the petal length?

• What is the mean of the sepal width for

Virginica species samples?

• What is the standard deviation of the

sepal length for Setosa species samples?

Exercise #3 involves questions for this example

VISUALIZATION

• 3 common ways to visualize data

• Matplotlib – low-level graph plotting library

• Pandas (via matplotlib)

• Seaborn (via matplotlib) – high-level plotting library

Many other libraries available, see this article for top 10:
https://www.projectpro.io/article/python-data-visualization-libraries/543
24
COMPONENTS OF MATPLOTLIB

• Figure object contains the

outermost container

• Axes translates to
individual plot/graph

• Lines, tickmarks, text

boxes, legends

https://matplotlib.org/stable/gallery/showcase/anatomy.html 25
MATPLOTLIB: USEFUL METHODS

• import matplotlib.pyplot as plt

• plt.show() – Display all open figures

• Plt.savefig(filename) – Save the figure in the filename

specified

26
MATPLOTLIB SCATTERPLOT
• Use pyplot module to make plots

• scatter() makes the scatterplot

• color code points by setting c to the appropriate categorical variable

• set_xlabel(), set_ylabel(), set_title() labels

your plot

• legend() is necessary to plot the legend explicitly

27
MATPLOTLIB FOO SCATTERPLOT

• import matplotlib.pyplot as plt

fig, ax = plt.subplots()
scatter = ax.scatter(foo[‘x1’], foo[‘x2’], c=foo[’y’])
# produce a legend with the unique colors from the scatter
legend1 = ax.legend(*scatter.legend_elements(),
loc="upper left", title="Classes")
ax.add_artist(legend1)
ax.set_xlabel(‘x1’)
ax.set_ylabel(‘x2’)
ax.set_title(“Foo x1 vs x2 scatterplot”)
# if in terminal need to show
plt.show()

28
MATPLOTLIB BOXPLOT

• boxplot() makes a boxplot

(demonstrates locality, spread, and
skewness of numerical data through
their quartiles)

https://matplotlib.org/stable/gallery/statistics/boxplot_demo.html 29
MATPLOTLIB FOO BOXPLOT
• import matplotlib.pyplot as plt
fig, ax = plt.subplots()
data = [foo.loc[foo['y'] == 0, 'x1'], foo.loc[foo['y'] == 1, 'x1']]
ax.boxplot(data)
ax.set_title('Boxplot of x1 distribution based on y')
ax.set_xticklabels([0, 1])
ax.set_xlabel("y")
ax.set_ylabel("x1")
# if in terminal need to show
plt.show()

30
PANDAS PLOTTING
• Provides a mechanism to generate plots directly from dataframes

• Call the dataframe.plot(kind=<method>) or

dataframe.plot.<method>

• box()

• scatter()

• hist()
31
PANDAS FOO SCATTERPLOT
• colors = {0: 'orange', 1: 'purple’}
color_list = [colors[group] for group in foo['y’]]
# Create a scatter plot with color-coding based on
'categorical_variable’
ax = foo.plot.scatter('x1', 'x2', c=color_list)
# Create legend handles, labels for each group and add legend to the
plot
import matplotlib.patches as mpatches
legend_handles = [
mpatches.Patch(color=colors[0], label=0),
mpatches.Patch(color=colors[1], label=1)]
ax.legend(handles=legend_handles, loc='upper left’)
ax.set_xlabel("x1")
ax.set_ylabel("x2")
ax.set_title("Foo x1 vs x2 scatterplot")
plt.show()
32
PANDAS FOO BOXPLOT

• foo.boxplot(column='x1', by='y')
# if in terminal need to show
plt.show()

33
SEABORN PLOTTING

• A high-level interface for drawing attractive and informative

statistical graphics

• Call different methods to obtain different types of plots

• boxplot()

• scatterplot()

34
SEABORN FOO SCATTERPLOT

• import seaborn as sns

snsax = sns.scatterplot(data=foo, x='x1', y='x2', hue='y’)
snsax.set(title="Foo x1 vs x2 scatterplot")
plt.show()

35
SEABORN FOO BOXPLOT

• snsax = sns.boxplot(data=foo, x="y", y="x1")

snsax.set(title='Boxplot of x1 distribution based on y')
plt.show()

36
CODING EXAMPLE
• Given iris dataframe

• Generate a boxplot of the petal length for the

iris dataset where the x-axis contains the
species label and the y is the petal length.

• Generate a scatterplot of the petal length

versus petal width for the iris dataset where the
x-axis contains the petal length and the y is the
petal width. Each point should be colored
according to the species label.

Exercise #3 involves questions for this example

Python Pandas and Matplotlib 7
100% (3)
Python Pandas and Matplotlib 7
72 pages
Fundamentals of Data Science Lab Manual New1
No ratings yet
Fundamentals of Data Science Lab Manual New1
32 pages
Data Analysis and Visulaization Experiment
No ratings yet
Data Analysis and Visulaization Experiment
104 pages
Python-Unit-4
No ratings yet
Python-Unit-4
43 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
Intext Email Password Ext
100% (1)
Intext Email Password Ext
1 page
21284254 Python Module 5
No ratings yet
21284254 Python Module 5
43 pages
Numpy_Data_Analysis_and_visualisation_with_Python
No ratings yet
Numpy_Data_Analysis_and_visualisation_with_Python
75 pages
ML3_Data_Analysis
No ratings yet
ML3_Data_Analysis
80 pages
NumPy, Pandas, MatplotLib,Seaborn, ScikitLearn (SkLearn)
No ratings yet
NumPy, Pandas, MatplotLib,Seaborn, ScikitLearn (SkLearn)
14 pages
NumPy & Pandas
No ratings yet
NumPy & Pandas
27 pages
DAP_5_module
No ratings yet
DAP_5_module
68 pages
Py PPT 06
No ratings yet
Py PPT 06
33 pages
12_Numpy&Matplotlib
No ratings yet
12_Numpy&Matplotlib
48 pages
Lesson 03 3.01 Python Libraries For Data Science
No ratings yet
Lesson 03 3.01 Python Libraries For Data Science
79 pages
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
No ratings yet
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
76 pages
Unit-5
No ratings yet
Unit-5
39 pages
4 Introduction to Python Part 3(1)
No ratings yet
4 Introduction to Python Part 3(1)
62 pages
Unit Vi
No ratings yet
Unit Vi
60 pages
Week 4- Introduction to Python #3
No ratings yet
Week 4- Introduction to Python #3
47 pages
Introduction: My Grandmother's House Is A Poem Written by Indian Poet Kamala
100% (2)
Introduction: My Grandmother's House Is A Poem Written by Indian Poet Kamala
7 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
61 pages
4 Introduction to Python Part 3 (2)
No ratings yet
4 Introduction to Python Part 3 (2)
48 pages
Unit 3
No ratings yet
Unit 3
19 pages
Python_for_AIML2
No ratings yet
Python_for_AIML2
21 pages
Data Visualization1
No ratings yet
Data Visualization1
52 pages
PP&DS UNIT III
No ratings yet
PP&DS UNIT III
26 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Python Numpyandpandas 170922144956
No ratings yet
Python Numpyandpandas 170922144956
14 pages
2-Introduction to exploratory data analysis using R or Python-14-12-2024
No ratings yet
2-Introduction to exploratory data analysis using R or Python-14-12-2024
13 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
36 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
ML_LAB_MANUAL
No ratings yet
ML_LAB_MANUAL
12 pages
unit-3(FODS)
No ratings yet
unit-3(FODS)
34 pages
unit 5
No ratings yet
unit 5
28 pages
PP Unit 4 Q&A
No ratings yet
PP Unit 4 Q&A
25 pages
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
No ratings yet
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
14 pages
pythonlibraries[1]
No ratings yet
pythonlibraries[1]
20 pages
Unit 5 PythonPackages(Matplotlib)
No ratings yet
Unit 5 PythonPackages(Matplotlib)
24 pages
Python For Statistics
No ratings yet
Python For Statistics
40 pages
3 Python
No ratings yet
3 Python
16 pages
Answers 1
No ratings yet
Answers 1
17 pages
RAW Data
No ratings yet
RAW Data
22 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
12 pages
Unit 5
No ratings yet
Unit 5
27 pages
EXP1-siddhant gupta (23_SE_148)
No ratings yet
EXP1-siddhant gupta (23_SE_148)
17 pages
PP_unit-5_notes
No ratings yet
PP_unit-5_notes
15 pages
Introduction To Python (Part III)
No ratings yet
Introduction To Python (Part III)
29 pages
Lab description file (4)
No ratings yet
Lab description file (4)
11 pages
Python Libraries
No ratings yet
Python Libraries
27 pages
Possession (Byatt Novel) : Possession: A Romance Is A 1990 Bestselling
No ratings yet
Possession (Byatt Novel) : Possession: A Romance Is A 1990 Bestselling
3 pages
Tutorial 2
No ratings yet
Tutorial 2
9 pages
Ndebele Lessons Book by Patrick
100% (1)
Ndebele Lessons Book by Patrick
45 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
Machine Learning Lab File: Submitted To: Submitted by
9 pages
Input Output Organization Question Answer
No ratings yet
Input Output Organization Question Answer
33 pages
Numpy & Pandas
No ratings yet
Numpy & Pandas
13 pages
Numpy&pandas
No ratings yet
Numpy&pandas
17 pages
Attachment 3 Python for Data Analysis Lyst9850 (1)
No ratings yet
Attachment 3 Python for Data Analysis Lyst9850 (1)
31 pages
NUMPY
No ratings yet
NUMPY
33 pages
Practice Paper Further Pure Maths Paper 2 Qp
No ratings yet
Practice Paper Further Pure Maths Paper 2 Qp
35 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
3391
No ratings yet
3391
25 pages
chessResultsList.xlsx (13)
No ratings yet
chessResultsList.xlsx (13)
7 pages
Message of the CPNP -PNPA CADET BAPTIST MINISTRY - NOVEMBER 30, 2024
No ratings yet
Message of the CPNP -PNPA CADET BAPTIST MINISTRY - NOVEMBER 30, 2024
1 page
Cheese Willem Elsschot download
100% (1)
Cheese Willem Elsschot download
28 pages
MATH 6 Q1 Week 5 - Shared To DTC by Ma'Am Helen D. Canono
No ratings yet
MATH 6 Q1 Week 5 - Shared To DTC by Ma'Am Helen D. Canono
87 pages
Alker, HR - Rediscoveries and Reformulations, Humanistic Methodologies For International Studies PDF
No ratings yet
Alker, HR - Rediscoveries and Reformulations, Humanistic Methodologies For International Studies PDF
486 pages
ISY34AT Formative Assessment 1 2022 - S1 - Memo
No ratings yet
ISY34AT Formative Assessment 1 2022 - S1 - Memo
9 pages
Bright Stars1
No ratings yet
Bright Stars1
10 pages
Garuda 2502113
No ratings yet
Garuda 2502113
21 pages
漫谈WebLogic CVE 2020 2551
No ratings yet
漫谈WebLogic CVE 2020 2551
21 pages
Practical File Class 12 Cssession 2024-25
No ratings yet
Practical File Class 12 Cssession 2024-25
3 pages
Lec 21 Memory Management Part 3
No ratings yet
Lec 21 Memory Management Part 3
24 pages
Africa Toto
No ratings yet
Africa Toto
2 pages
Final Written Final Exam For HBP Management Communication Solutions
No ratings yet
Final Written Final Exam For HBP Management Communication Solutions
6 pages
PSYC1101-Cognition-2 - 2019 SLsky 4 Pre-Lecture Posting
No ratings yet
PSYC1101-Cognition-2 - 2019 SLsky 4 Pre-Lecture Posting
30 pages
Controller of Examinations Banaras Hindu University
No ratings yet
Controller of Examinations Banaras Hindu University
1 page
Handwritten Signature Verification Using Deep Learning
No ratings yet
Handwritten Signature Verification Using Deep Learning
6 pages
Syllogisms
No ratings yet
Syllogisms
46 pages
Anjali: Siemens Healthineers, Bengaluru
No ratings yet
Anjali: Siemens Healthineers, Bengaluru
2 pages
Arabic Poetry
No ratings yet
Arabic Poetry
6 pages
English Vi Unit 4
No ratings yet
English Vi Unit 4
8 pages
The Windows NT Registry File Format
No ratings yet
The Windows NT Registry File Format
12 pages
@StudyTime - Channel Maths-4
No ratings yet
@StudyTime - Channel Maths-4
12 pages
Model Newsletter
No ratings yet
Model Newsletter
3 pages
NumPy Recipes
From Everand
NumPy Recipes
Martin McBride
No ratings yet
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet