0% found this document useful (0 votes)
7 views34 pages

Q-Step WS 06112019 Data Analysis and Visualisation With Python

The document provides an overview of NumPy and Pandas, two essential Python libraries for numerical computing and data analysis. It highlights the importance of NumPy for efficient array operations and introduces key features of both libraries, including array creation, data types, and filtering capabilities. Additionally, it discusses the structure and functionalities of Pandas data frames and series, emphasizing their utility in handling and analyzing data.

Uploaded by

sridhanish27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views34 pages

Q-Step WS 06112019 Data Analysis and Visualisation With Python

The document provides an overview of NumPy and Pandas, two essential Python libraries for numerical computing and data analysis. It highlights the importance of NumPy for efficient array operations and introduces key features of both libraries, including array creation, data types, and filtering capabilities. Additionally, it discusses the structure and functionalities of Pandas data frames and series, emphasizing their utility in handling and analyzing data.

Uploaded by

sridhanish27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Numpy and Pandas

with Python
Google Developer Groups on Campus ESEC
Numerical Python (NumPy)
• NumPy is the most foundational package for numerical computing in Python.
• If you are going to work on data analysis or machine learning projects, then
having a solid understanding of NumPy is nearly mandatory.
• Indeed, many other libraries, such as pandas and scikit-learn, use NumPy’s
array objects as the lingua franca for data exchange.
• One of the reasons as to why NumPy is so important for numerical
computations is because it is designed for efficiency with large arrays of data.
The reasons for this include:
- It stores data internally in a continuous block of memory,
independent of other in-built Python objects.
- It performs complex computations on entire arrays without the
need for for loops.
What you’ll find in NumPy
• ndarray: an efficient multidimensional array providing fast array-orientated
arithmetic operations and flexible broadcasting capabilities.
• Mathematical functions for fast operations on entire arrays of data without
having to write loops.
• Tools for reading/writing array data to disk and working with memory-
mapped files.
• Linear algebra, random number generation, and Fourier transform
capabilities.
• A C API for connecting NumPy with libraries written in C, C++, and
FORTRAN. This is why Python is the language of choice for wrapping legacy
codebases.
The NumPy ndarray: A multi-
dimensional array object
• The NumPy ndarray object is a fast and flexible container for large
data sets in Python.
• NumPy arrays are a bit like Python lists, but are still a very different
beast at the same time.
• Arrays enable you to store multiple items of the same data type. It is
the facilities around the array object that makes NumPy so convenient
for performing math and data manipulations.
Ndarray vs. lists
• By now, you are familiar with Python lists and how incredibly useful
they are.
• So, you may be asking yourself:

“I can store numbers and other objects in a Python list and do all sorts
of computations and manipulations through list comprehensions, for-
loops etc. What do I need a NumPy array for?”

• There are very significant advantages of using NumPy arrays overs


lists.
Creating a NumPy array
• To understand these advantages, lets create an array.
• One of the most common, of the many, ways to create a NumPy array
is to create one from a list by passing it to the np.array() function.

In Ou
: t:
Differences between lists and
ndarrays
• The key difference between an array and a list is that arrays are
designed to handle vectorised operations while a python lists are not.
• That means, if you apply a function, it is performed on every item in
the array, rather than on the whole array object.
• Let’s suppose you want to add the number 2 to every item in the list.
The intuitive way to do this is something like this:

In Ou
: t:

• That was not possible with a list, but you can do that on an array:

In Ou
: t:
• It should be noted here that, once a Numpy array is created, you
cannot increase its size.
• To do so, you will have to create a new array.
Create a 2d array from a list of list
• You can pass a list of lists to create a matrix-like a 2d array.

In
Ou
:
t:
The dtype argument
• You can specify the data-type by setting the dtype() argument.
• Some of the most commonly used NumPy dtypes are: float, int,
bool, str, and object.

In
Ou
:
t:
The astype argument
• You can also convert it to a different data-type using the astype method.

In Ou
: t:

• Remember that, unlike lists, all items in an array have to be of the same
type.
dtype=‘object’
• However, if you are uncertain about what data type your array will
hold, or if you want to hold characters and numbers in the same array,
you can set the dtype as 'object'.

In Ou
: t:
The tolist() function
• You can always convert an array into a list using the tolist()
command.

In Ou
: t:
Inspecting a NumPy array
• There are a range of functions built into NumPy that allow you to
inspect different aspects of an array:

In
: Ou
t:
Extracting specific items from an
array
• You can extract portions of the array using indices, much like when
you’re working with lists.
• Unlike lists, however, arrays can optionally accept as many parameters
in the square brackets as there are number of dimensions

In Ou
: t:
Boolean indexing
• A boolean index array is of the same shape as the array-to-be-filtered,
but it only contains TRUE and FALSE values.

In Ou
: t:
Pandas
• Pandas, like NumPy, is one of the most popular Python libraries for
data analysis.
• It is a high-level abstraction over low-level NumPy, which is written in
pure C.
• Pandas provides high-performance, easy-to-use data structures and
data analysis tools.
• There are two main structures used by pandas; data frames and
series.
Indices in a pandas series
• A pandas series is similar to a list, but differs in the fact that a series associates a label
with each element. This makes it look like a dictionary.
• If an index is not explicitly provided by the user, pandas creates a RangeIndex ranging
from 0 to N-1.
• Each series object also has a data type.

In O
: ut
:
• As you may suspect by this point, a series has ways to extract all of
the values in the series, as well as individual elements by index.

In O
: ut
:

• You can also provide an index manually.


In
:
Ou
t:
• It is easy to retrieve several elements of a series by their indices or
make group assignments.

Ou
In t:
:
Filtering and maths operations
• Filtering and maths operations are easy with Pandas as well.

In O
: ut
:
Pandas data frame
• Simplistically, a data frame is a table, with rows and columns.
• Each column in a data frame is a series object.
• Rows consist of elements inside series.

Case ID Variable one Variable two Variable 3


1 123 ABC 10
2 456 DEF 20
3 789 XYZ 30
Creating a Pandas data frame
• Pandas data frames can be constructed using Python dictionaries.
In
:

Ou
t:
• You can also create a data frame from a list.

In Ou
: t:
• You can ascertain the type of a column with the type() function.

In
:

Ou
t:
• A Pandas data frame object as two indices; a column index and row
index.
• Again, if you do not provide one, Pandas will create a RangeIndex
from 0 to N-1.
In
:

Ou
t:
• There are numerous ways to provide row indices explicitly.
• For example, you could provide an index when creating a data frame:

In Ou
: t:

• or do it during runtime.
• Here, I also named the index ‘country code’.
Ou
In t:
:
• Row access using index can be performed in several ways.
• First, you could use .loc() and provide an index label.

In Ou
: t:

• Second, you could use .iloc() and provide an index number

In Ou
: t:
• A selection of particular rows and columns can be selected this way.

In Ou
: t:

• You can feed .loc() two arguments, index list and column list, slicing
operation is supported as well:

In Ou
: t:
Filtering
• Filtering is performed using so-called Boolean arrays.
Deleting columns
• You can delete a column using the drop() function.
In Ou
: t:

In Ou
: t:
Reading from and writing to a file
• Pandas supports many popular file formats including CSV, XML, HTML,
Excel, SQL, JSON, etc.
• Out of all of these, CSV is the file format that you will work with the most.
• You can read in the data from a CSV file using the read_csv() function.

• Similarly, you can write a data frame to a csv file with the to_csv()
function.
Any questions?

You might also like