0% found this document useful (0 votes)
49 views34 pages

Week 13

Uploaded by

bilginosan1903
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views34 pages

Week 13

Uploaded by

bilginosan1903
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

METU Computer Engineering

2024
CEng 240 – Spring 2021
Week 14 13

Scientific and Engineering Libraries


Part 2: Pandas and Matplotlib

Sinan Kalkan
This Week
METU Computer Engineering

¢ Scientific and Engineering Libraries


§ Pandas for data handling and analysis
§ Matplotlib for plotting

2020 S. Kalkan - CEng 240 2


METU Computer Engineering

2020 S. Kalkan - CEng 240 4


Outline
METU Computer Engineering

¢ Overview
¢ Installation
¢ DataFrames
¢ Accessing data in DataFrames
¢ Modifying data in DataFrames
¢ Analyzing data in DataFrames
¢ Presenting data in DataFrames

2020 S. Kalkan - CEng 240 5


Overview
METU Computer Engineering

¢ A handy library for:


§ working with files of different formats
§ manipulating & analyzing data
¢ Data types & structures for
§ tables, especially numerical tables,
§ time series
¢ Name comes from “panel data”

2020 S. Kalkan - CEng 240 6


Installation
METU Computer Engineering

¢ On your Linux environment:


$ pip install pandas
or
$ conda install pandas
¢ On Windows/Mac: install anaconda first
¢ On Colab, it is already installed
¢ import pandas as pd
Supported Files
METU Computer Engineering

¢ A wide collection of
file formats
¢ Each format has a
reader and a writer

For an up-to-date list:


https://pandas.pydata.org/pandas-
docs/stable/user_guide/io.html

2020 S. Kalkan - CEng 240 8


Data Frames
METU Computer Engineering

¢ Similar to NumPy’s ndarray datatype, Pandas


has a very fundamental data type called
DataFrame
¢ A DataFrame is created by
§ Data loaded from files (using a reader)
§ The constructor DataFrame()

2020 S. Kalkan - CEng 240 9


Data Frames
Loading data from files
METU Computer Engineering

This produces the following output:

For more information about the CSV file format,


have a look at the File Handling chapter.

Sample file ‘ch10_example.csv’ at:


https://raw.githubusercontent.com/sinankalkan/CENG240/master/figures/ch10_example.csv

2020 S. Kalkan - CEng 240 10


Data Frames
Loading data from files
METU Computer Engineering

More on pd.read_csv():
• Automatically loads column headers
• If your file does not have a header, use: pd.read_csv(filename, header=None)
• If you want to read specific columns, use:
pd.read_csv(filename, usecols=[‘column name 1’, ...])
• For more information & control, see help(pd.read_csv)

2020 S. Kalkan - CEng 240 11


Data Frames
Create a DataFrame from Python data
METU Computer Engineering

Use the pd.DataFrame() function:

If you need keys/names for each row, then:

2020 S. Kalkan - CEng 240 12


Data Frames
Create a DataFrame from Python data
METU Computer Engineering

It is also possible to create the columns of data in a dictionary and pass


that to the pd.DataFrame() function:

Note that the column names were retrieved from the keys of the dictionary

2020 S. Kalkan - CEng 240 13


Accessing Data
Column-wise access
METU Computer Engineering

¢ Use column names &


row names like keys
in a dictionary
¢ df[‘Name’] returns
the ‘Name’ column
§ Then you can use
integer index or
named index (key) in
each row

2020 S. Kalkan - CEng 240 14


Accessing Data
Row-wise access
METU Computer Engineering

¢ df.iloc[<row index>
§ for integer indexes
¢ df.loc[<row name>]
§ for named indexes
¢ Row & column indexing can be
combined:
§ df.loc[‘Amanda’, ‘Grade’]
§ df.iloc[1, 1]
¢ With integer indexes, Python’s
slicing ([start:end:step]) can be
used
2020 S. Kalkan - CEng 240 15
Modifying Data
METU Computer Engineering

¢ Modifying data is very easy


¢ Need to be careful about chained indexing
¢ No guarantee on df[‘Grade’] being a copy or a
direct access to the ‘Grade’ column

2020 S. Kalkan - CEng 240 16


Modifying Data
METU Computer Engineering

¢ Specify row & column


in one step/go
¢ Avoid chained
indexing when
modifying data

2020 S. Kalkan - CEng 240 17


Analyzing Data
METU Computer Engineering

¢ Pandas provides many


facilities for analyzing your
data in a DataFrame
¢ df.describe()
¢ df.value_counts()
¢ df.max() or df.min()
¢ df.sort_values(by=<col name>)
¢ df.nlargest(<n>)

2020 S. Kalkan - CEng 240 18


Analyzing Data
METU Computer Engineering

¢ Pandas provides many


facilities for analyzing your
data in a DataFrame
¢ df.describe()
¢ df.value_counts()
¢ df.max() or df.min()
¢ df.sort_values(by=<colu
mn name>)
¢ df.nlargest(<n>)

2020 S. Kalkan - CEng 240 19


Presenting Data
METU Computer Engineering

¢ plot() function

2020 S. Kalkan - CEng 240 20


METU Computer Engineering

2020 S. Kalkan - CEng 240 21


Outline
METU Computer Engineering

¢ Overview
¢ Installation
¢ Anatomy of a figure/plot
¢ Preparing your data
¢ Drawing single plots
¢ Drawing multiple plots
¢ Changing elements of a plot

2020 S. Kalkan - CEng 240 22


Overview
METU Computer Engineering

¢ A drawing library for Python


¢ A free and open source alternative to Matlab
¢ Allows 2D & 3D plots

2020 S. Kalkan - CEng 240 23


Overview
METU Computer Engineering

2020 S. Kalkan - CEng 240 24


Installation
METU Computer Engineering

¢ On your Linux environment:


$ pip install matplotlib
or
$ conda install matplotlib
¢ On Windows/Mac: install anaconda first
¢ On Colab, it is already installed
¢ import matplotlib.pyplot as plt

2020 S. Kalkan - CEng 240 25


Anatomy of a plot
METU Computer Engineering

¢ Canvas / drawing
area
§ scatter plot, line
plot, ...
¢ Axes
§ ticks, tick labels,
axis labels
¢ figure title
¢ legend

2020 S. Kalkan - CEng 240 26


Figure from: https://matplotlib.org/tutorials/introductory/usage.html
Preparing your data
METU Computer Engineering

¢ Matplotlib expects NumPy arrays


¢ Convert your data to NumPy
§ If your data is a Python data type, use array()
function to do the conversion
§ If your data is a DataFrame, use df.values, e.g.:

2020 S. Kalkan - CEng 240 27


Drawing single plots
METU Computer Engineering

Drawing in an Object-Oriented Style

¢ Create a figure object


and axes object
¢ Use their member
functions & variables

2020 S. Kalkan - CEng 240 28


Drawing single plots
METU Computer Engineering

Drawing in an Pyplot Style

¢ Use
matplotlib.pylot
directly

2020 S. Kalkan - CEng 240 29


Drawing multiple plots
METU Computer Engineering

¢ This example uses the


object-oriented approach

2020 S. Kalkan - CEng 240 30


Drawing multiple plots
METU Computer Engineering

Multiple plots PyPlot style Multiple plots OOP style

2020 S. Kalkan - CEng 240 31


Changing plot
elements
METU Computer Engineering

¢ All elements of a plot are changeable


§ ticks, tick labels, ...
§ line/dot color, line/dot size, shape, ..
§ legends, titles, ...
§ font style, size, ...
§ Latex support
¢ See
§ help(plt.plot)
§ https://matplotlib.org/2.1.1/contents.html

2020 S. Kalkan - CEng 240 32


Examples (from the book)
METU Computer Engineering

¢ Create a simple CSV file using your favorite spreadsheet editor


(e.g. Microsoft Excel or Google Spreadsheets) and create a file with your
exams and their grades as two separate columns. Save the file, upload it to
the Colab notebook and do the following:
§ Load the file using Pandas.
§ Calculate the mean of your exam grades.
§ Calculate the standard deviation of your grades.
¢ Using Matplotlib, generate the following plots with suitable names for the
axes and the titles.
§ Draw the following four functions in separate single
plots: sin(!),cos(!),tan(!),cot(!).
§ Draw these four functions in a single plot.
§ Draw a multiple 2x2 plot where each subplot is one of the four functions.

2020 S. Kalkan - CEng 240 33


Final Words:
Important Concepts
METU Computer Engineering

¢ Pandas, DataFrame, loading files with Pandas.


¢ Accessing and modifying content in
DataFrames.
¢ Analyzing and presenting data in DataFrames.
¢ Matplotlib and different ways to make plots.
¢ Drawing single and multiple plots. Changing
elements of a plot.

2021 S. Kalkan - CEng 240 34


METU Computer Engineering

THAT’S ALL FOLKS!


STAY HEALTHY

2020 S. Kalkan - CEng 240 35

You might also like