0% found this document useful (0 votes)
8 views11 pages

Lecture 21 Working With Pandas

The document provides an overview of using the Pandas library for data handling and analysis, particularly focusing on CSV files. It explains the concept of DataFrames, how to read CSV files, and access and modify data within these structures. Additionally, it highlights built-in functions for data analysis, such as descriptive statistics and sorting.

Uploaded by

Gareth Matina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views11 pages

Lecture 21 Working With Pandas

The document provides an overview of using the Pandas library for data handling and analysis, particularly focusing on CSV files. It explains the concept of DataFrames, how to read CSV files, and access and modify data within these structures. Additionally, it highlights built-in functions for data analysis, such as descriptive statistics and sorting.

Uploaded by

Gareth Matina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Lecture 21: Working with Pandas

Eng. Emanuel Rashayi


Faculty of Engineering, Dept of Electrical & Electronics Engineering , University of Zimbabwe
[email protected]
Data handling & analysis with Pandas
➢ Pandas is a very handy library for working with files with different formats and
analyzing data in different format.
➢ To keep things simple, in this section we will just look at CSV files
➢ If you are using a Python package manager (e.g. pip), you can install it directly using:
$ pip install pandas
➢ As we have seen before, Python already provides facilities for reading and writing
files.
➢ However, if you need to work with “structured” files such as CSV, XML, XLSX, JSON,
HTML, the native facilities of Python would need to extended significantly.
➢ That’s where Pandas comes in
The file formats supported by the Pandas library
Data Frames
➢ Pandas library is built on top of a data type called DataFrame which is used to store
all types of data while working with Pandas.
➢ In the following, we will illustrate how you can get your data into a DataFrame object:
Data Frames
Note the following critical details:
➢ read_csv file automatically understood that our CSV file had a header (“Name”,
“Grade” and “Age”).
➢ If your file does not have a header, you can call read_csv with the header parameter
set to None as follows: pd.read_csv(filename, header=None).
➢ read_csv read all columns in the CSV file.
➢ If you wish to load only some of the columns (e.g. ‘Name’, ‘Age’ in our example), you
can relay this using the usecols parameter as follows: pd.read_csv(filename,
usecols=[ 'Name', 'Age']).
Convert Python data into a ``DataFrame``

You can have already a Python data object which you can provide as argument to a
DataFrame constructor as illustrated with the following example:
Convert Python data into a ``DataFrame``
➢ In many cases, we will require the rows to be associated with names, or sometimes
called as keys. For example, instead of referring to a row as “the row at index 1”,
we might require accessing a row with a non-integer value.
➢ This can be achieved as follows (note how the printed DataFrame looks different):
Accessing Data with DataFrames
➢ We can access data columnwise or row-wise.
1. Column wise access 2. Rowwise access.
Modifying Data with DataFrames
Analyzing Data with DataFrame
➢ Once we have our data in a DataFrame, we can use Pandas’s built-in facilities for
analyzing our data.
➢ One very simple way to analyze data is via descriptive statistics, which you can
access with df.describe() and df[<column>].value_counts():

Apart from these descriptive functions, Pandas provides functions


for sorting (using .sort_values() function), finding the maximum or
the minimum (using .max() or .min() functions) or finding the largest
or smallest n values (using .nsmallest() or .nlargest() functions:
Presenting Data in DataFrames

You might also like