Python - Data Science
Python language in data processing and
data mining
Lecture 1.
WSB University
Dariusz Badura
Python in Data Science 2023-24 1
Information about Lecture materials
presented on On-lineWSB platform
On-lineWSB platform
Link: https://online.wsb.edu.pl/course/view.php?id=10797
Password: PLPBZima23
The course discusses the basics of:
Python and libraries,
data analisis,
data science.
Python in Data Science 2023-24 2
The criteria for passing the lecture
• At the end of the semester there will be a test
on the topics presented in the lecture: 20 – 30
of test questions.
• Four tests and tasks per semester on lecture
issues.
• Materials of lecture presenting will be posted
on the e-learning platform On-line WSB.
Python in Data Science 2023-24 3
Bibliography
Wes McKinney: Python for Data Analysis: Data
Wrangling with pandas, NumPy, and Jupyter, 3rd
Edition; O’Reilly Media, Inc. © 2022.
Sarah Guido, Andreas Müller: Introduction to Machine
Learning with Python: A Guide for Data Scientists;
O’Reilly Media, Inc. © 2017.
Sandy Ryza, Uri Laserson, Sean Owen, & Josh Wills:
Advanced Analytics with Spark; by O’Reilly Media, Inc.
June 2017;
Others Internet sources …
Python in Data Science 2023-24 4
Data processing
• …, manipulation of data by a computer. It includes the conversion of
raw data to machine-readable form, flow of data through the CPU and
memory to output devices, and formatting or transformation of
output.
• … the collection and manipulation of digital data to produce
meaningful information. Data processing is a form of information
processing, which is the modification of information in any manner
detectable by an observer.
• The term "Data Processing", has also been used to refer to a
department within an organization responsible for the operation of
data processing programs.
Python in Data Science 2023-24 5
Data processing functions
Data processing may involve various processes, including:
• Validation – Ensuring that supplied data is correct and relevant.
• Sorting – "arranging items in some sequence and/or in different
sets."
• Summarization(statistical) or (automatic) – reducing detailed data
to its main points.
• Aggregation – combining multiple pieces of data.
• Analysis – the "collection, organization, analysis, interpretation and
presentation of data."
• Reporting – list detail or summary data or computed information.
• Classification – separation of data into various categories.
Python in Data Science 2023-24 6
Data mining
• Data mining is the process of sorting through large data sets to identify
patterns and relationships that can help solve business problems through data
analysis.
• Data mining is the process of extracting and discovering patterns in large data
sets involving methods at the intersection of machine learning, statistics,
and database systems. Data mining is an interdisciplinary subfield of computer
science and statistics with an overall goal of extracting information and
transforming the information into a comprehensible structure for further use.
• Data mining is the analysis step of the "knowledge discovery in databases"
process, or KDD. It also involves database and data management aspects, data
pre-processing, model and inference considerations, interestingness
metrics, complexity considerations, post-processing of discovered
structures, visualization, and online updating.
Python in Data Science 2023-24 7
Lecture issues
Data exploration & analysis.
– Pandas; NumPy; SciPy. (and others)
Data visualization.
– Matplotlib; Seaborn; Datashader; others.
Classical machine learning.
– Scikit-Learn, StatsModels.
Deep learning.
– Keras, TensorFlow, and a whole host of others.
Data storage and big data frameworks.
– Apache Spark; Apache Hadoop; HDFS; Dask; h5py/pytables.
Odds and ends.
– nltk; Spacy; OpenCV/cv2; scikit-image; Cython.
Python in Data Science 2023-24 8
Plan of the first lecture
A quick overview of the features of Python,
NumPy library overview,
Pandas library overview.
Python in Data Science 2023-24 9
Python
BASICS OF THE LANGUAGE
Python in Data Science 2023-24 10
Why Python?
… A general programming language, thanks to
the libraries pandas, NumPy, scipy, matplotlib,
TensorFlow ... it has become a powerful
environment for scientific calculations.
The basic features of the language:
Basic data types
Containers
Functions
classes.
https://docs.python.org/3/tutorial/index.html
Python in Data Science 2023-24 11
The Interpreter and Its Environment
Anaconda:
Spider
Jupyter
InPy
Colab
Other development environments:
PyDev (free) – an integrated development environment built on the Eclipse platform;
PyCharm by JetBrains;
Python Tools for Visual Studio for Windows users;
Spyder (free) – an integrated development environment included with the Anaconda
interpreter;
Komod - commercial integrated development environment.
Python in Data Science 2023-24 12
Python versions
There are currently two different supported
versions of Python, 2.7 and 3.7.
Python 3.0 introduced many backwards-
incompatible changes to the language, so code
written for 2.7 may not work in 3.7 and vice
versa.
the code presented will use Python 3.5.
You can check the Python version on the
command line by running python --version.
Python in Data Science 2023-24 13
Example: Python implementation of the classic
„quicksort” algorithm:
def quicksort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quicksort(left) + middle +
quicksort(right)
print(quicksort([3,6,8,10,1,2,1]))
# Prints "[1, 1, 2, 3, 6, 8, 10]"
Python in Data Science 2023-24 14
Basic data types
Numbers: Integers and floating point numbers
work as they do in other languages:
x = 3
print(type(x)) # Prints "<class 'int'>"
print(x) # Prints “5"
print(x + 1) # Addition; prints “6"
print(x - 1) # Subtraction; prints “4"
print(x * 2) # Multiplication; prints “10"
print(x ** 2) # Exponentiation; prints “25"
x += 1
print(x) # Prints “6"
x *= 2
print(x) # Prints “10"
y = 3.7
print(type(y)) # Prints "<class 'float'>"
print(y, y + 1, y * 2, y ** 2) # Prints “3.7 4.7 7.4 13.69"
Python in Data Science 2023-24 15
Basic data types
Python does not have unary increment (x++) or decrement (x--) operators.
Python has built-in complex number types.
Booleans: Python implements all the usual boolean logic operators, but
uses English words instead of symbols (&&, ||, etc.):
t = True
f = False
print(type(t)) # Prints "<class 'bool'>"
print(t and f) # Logical AND; prints "False"
print(t or f) # Logical OR; prints "True"
print(not t) # Logical NOT; prints "False"
print(t != f) # Logical XOR; prints "True"
Python in Data Science 2023-24 16
Basic data types
Strings: Python has strong support for strings:
hello = 'hello' # String literals can use single quotes
world = "world" # or double quotes; it does not matter.
print(hello) # Prints "hello"
print(len(hello)) # String length; prints "5"
hw = hello + ' ' + world # String concatenation
print(hw) # prints "hello world"
hw12 = '%s %s %d' % (hello, world, 12) # sprintf style string formatting
print(hw12) # prints "hello world 12"
Useful methods
s = "hello"
print(s.capitalize()) # Capitalize a string; prints "Hello"
print(s.upper()) # Convert a string to uppercase; prints "HELLO"
print(s.rjust(7)) # Right-justify a string, padding with spaces; prints "
hello"
print(s.center(7)) # Center a string, padding with spaces; prints " hello "
print(s.replace('l', '(ell)')) # Replace all instances of one substring with
another;
# prints "he(ell)(ell)o"
print(' world '.strip()) # Strip leading and trailing whitespace; prints
"world"
Python in Data Science 2023-24 17
Containers
Python comes with several built-in container types: lists,
dictionaries, sets, and tuples.
Lists
A list is equivalent to an array in Python, but it is resizeable and
can contain elements of different types:
xs = [3, 1, 2] # Create a list
print(xs, xs[2]) # Prints "[3, 1, 2] 2"
print(xs[-1]) # Negative indices count from the end of the list; prints "2"
xs[2] = 'foo' # Lists can contain elements of different types
print(xs) # Prints "[3, 1, 'foo']"
xs.append('bar') # Add a new element to the end of the list
print(xs) # Prints "[3, 1, 'foo', 'bar']"
x = xs.pop() # Remove and return the last element of the list
print(x, xs) # Prints "bar [3, 1, 'foo']"
Python in Data Science 2023-24 18
Containers -> lists
Slicing: In addition to accessing list elements individually, Python provides a
concise syntax to access sublists; this is known as slicing:
nums = list(range(5)) # range is a built-in function that
creates a list of integers
print(nums) # Prints "[0, 1, 2, 3, 4]"
print(nums[2:4]) # Get a slice from index 2 to 4
(exclusive); prints "[2, 3]"
print(nums[2:]) # Get a slice from index 2 to the end;
prints "[2, 3, 4]"
print(nums[:2]) # Get a slice from the start to index 2
(exclusive); prints "[0, 1]"
print(nums[:]) # Get a slice of the whole list; prints
"[0, 1, 2, 3, 4]"
print(nums[:-1]) # Slice indices can be negative; prints
"[0, 1, 2, 3]"
nums[2:4] = [8, 9] # Assign a new sublist to a slice
print(nums) # Prints "[0, 1, 8, 9, 4]"
Python in Data Science 2023-24 19
Containers -> lists -> Loops
List items can be enclosed in a loop
animals = ['cat', 'dog', 'monkey']
for animal in animals:
print(animal)
To access the index of each element in the loop body you can
access it by using a built-in function enumerate:
animals = ['cat', 'dog', 'monkey']
for idx, animal in enumerate(animals):
print('#%d: %s' % (idx + 1, animal))
# Prints "#1: cat", "#2: dog", "#3:
monkey", each on its own line
Python in Data Science 2023-24 20
Containers -> lists -> List comprehension
When programming, we often want to transform one type of data into
another. As a simple example, consider the following code that calculates
square numbers:
nums = [0, 1, 2, 3, 4]
squares = []
for x in nums:
squares.append(x ** 2)
print(squares) # Prints [0, 1, 4, 9, 16]
We can simplify this code:
nums = [0, 1, 2, 3, 4]
squares = [x ** 2 for x in nums]
print(squares) # Prints [0, 1, 4, 9, 16]
List words may also contain the conditions:
nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
print(even_squares) # Prints "[0, 4, 16]"
Python in Data Science 2023-24 21
Containers -> dictionaries
A dictionary stores (key, value) pairs, similar to a map in Java or
an object in JavaScript. We can use it this way:
d = {'cat': 'cute', 'dog': 'furry'} # Create a new dictionary
with some data
print(d['cat']) # Get an entry from a dictionary; prints
"cute"
print('cat' in d) # Check if a dictionary has a given key;
prints "True"
d['fish'] = 'wet' # Set an entry in a dictionary
print(d['fish']) # Prints "wet"
# print(d['monkey']) # KeyError: 'monkey' not a key of d
print(d.get('monkey', 'N/A')) # Get an element with a default;
prints "N/A"
print(d.get('fish', 'N/A')) # Get an element with a default;
prints "wet"
del d['fish'] # Remove an element from a dictionary
print(d.get('fish', 'N/A')) # "fish" is no longer a key; prints
"N/A"
Python in Data Science 2023-24 22
Containers -> dictionaries-> Loops
Iterating through a dictionary by key:
d = {'person': 2, 'cat': 4, 'spider': 8}
for animal in d:
legs = d[animal]
print('A %s has %d legs' % (animal, legs))
# Prints "A person has 2 legs", "A cat has 4 legs", "A spider has
8 legs"
Access the keys and their corresponding values using the items:
d = {'person': 2, 'cat': 4, 'spider': 8}
for animal, legs in d.items():
print('A %s has %d legs' % (animal, legs))
# Prints "A person has 2 legs", "A cat has 4 legs", "A spider has
8 legs"
Python in Data Science 2023-24 23
Containers -> dictionaries comprehensions
Similar to list descriptions, but allow you to easily create
dictionaries. For example:
nums = [0, 1, 2, 3, 4]
even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0}
print(even_num_to_square) # Prints "{0: 0, 2: 4, 4: 16}"
Python in Data Science 2023-24 24
Sets
A set is an unordered collection of distinct elements.
Example:
animals = {'cat', 'dog'}
print('cat' in animals) # Check if an element is in a set;
prints "True"
print('fish' in animals) # prints "False"
animals.add('fish') # Add an element to a set
print('fish' in animals) # Prints "True"
print(len(animals)) # Number of elements in a set; prints
"3"
animals.add('cat') # Adding an element that is already in
the set does nothing
print(len(animals)) # Prints "3"
animals.remove('cat') # Remove an element from a set
print(len(animals)) # Prints "2"
Python in Data Science 2023-24 25
Sets
Loops: Iterating over a set has the same syntax as iterating over a list;
however, because sets are unordered, assumptions cannot be made about
the order in which the set's elements are visited:
animals = {'cat', 'dog', 'fish'}
for idx, animal in enumerate(animals):
print('#%d: %s' % (idx + 1, animal))
# Prints "#1: fish", "#2: dog", "#3: cat"
Set comprehensions: Like lists and dictionaries, we can easily construct sets using
the comprehensions set:
from math import sqrt
nums = {int(sqrt(x)) for x in range(30)}
print(nums) # Prints "{0, 1, 2, 3, 4, 5}"
Python in Data Science 2023-24 26
Tuples
A tuple is an (immutable) ordered list of values. A tuple is similar
to a list in many ways; one difference is that tuples can be used as
keys in dictionaries and as elements of sets, while lists cannot.
Example:
d = {(x, x + 1): x for x in range(10)} # Create a
dictionary with tuple keys
t = (5, 6) # Create a tuple
print(type(t)) # Prints "<class 'tuple'>"
print(d[t]) # Prints "5"
print(d[(1, 2)]) # Prints "1"
Python in Data Science 2023-24 27
Functions
Python functions are defined using the def keyword. Example:
def sign(x):
if x > 0:
return 'positive'
elif x < 0:
return 'negative'
else:
return 'zero'
for x in [-1, 0, 1]:
print(sign(x))
# Prints "negative", "zero", "positive"
Python in Data Science 2023-24 28
Functions
We often define functions to take optional keyword arguments, like
this:
def hello(name, loud=False):
if loud:
print('HELLO, %s!' % name.upper())
else:
print('Hello, %s' % name)
hello('Bob') # Prints "Hello, Bob"
hello('Fred', loud=True) # Prints "HELLO, FRED!"
Python in Data Science 2023-24 29
Classes
Example
class Greeter(object):
# Constructor
def __init__(self, name):
self.name = name # Create an instance variable
# Instance method
def greet(self, loud=False):
if loud:
print('HELLO, %s!' % self.name.upper())
else:
print('Hello, %s' % self.name)
g = Greeter('Fred') # Construct an instance of the Greeter class
g.greet() # Call an instance method; prints "Hello,
Fred"
g.greet(loud=True) # Call an instance method; prints "HELLO,
FRED!"
Python in Data Science 2023-24 30
NumPy library
• The topic of NumPy and pandas libraries refers to datasets can come from a wide range of
sources and in a wide range of formats, including:
– collections of documents,
– collections of images,
– collections of sound clips,
– collections of numerical measurements, or
– … nearly anything technical issues.
• Data sets can be represented as arrays of numbers.
– digital images—can be thought of as simply two-dimensional arrays of numbers representing pixel
brightness across the area.
– sound clips can be thought of as one-dimensional arrays of intensity versus time.
– text can be converted in various ways into numerical representations, such as binary digits
representing the frequency of certain words or pairs of words.
• Efficient storage and manipulation of numerical arrays is fundamental to the process of
doing data science.
• Install NumPy : http://www.numpy.org/
• The import NumPy and for example double-check the version:
Python in Data Science 2023-24 31
NumPy module
• The Numpy module is a basic library for
scientific calculations in Python (including
matrix multiplication and addition,
diagonalization or inversion, integration,
solving equations, etc.).
• It provides us with specialized data types,
operations and functions that are not available
in a typical Python installation.
Python in Data Science 2023-24 32
Creating Arrays from Python Lists
• Unlike Python lists, NumPy arrays can only contain data
of the same type. If the types do not match, NumPy will
upcast them according to its type promotion rules;
here, integers are upcast to floating point:
• Integer array: np.array([1, 4, 2, 5, 3])
• Float point array: np.array([3.14, 4, 2, 3])
• Use dtype keyword:
np.array([1, 2, 3, 4], dtype=np.float32)
Python in Data Science 2023-24 33
Creating Arrays from Scratch
For larger arrays – more efficient to create arrays from scratch
using routines built into NumPy
• np.zeros(10, dtype=int)
• np.ones((3, 5), dtype=float)
• np.full((3, 5), 3.14)
• # Create an array filled with a linear sequence # starting at 0,
ending at 20, stepping by 2 # (this is similar to the built-in range
function) np.arange(0, 20, 2)
• # Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)
• … and others
Python in Data Science 2023-24 34
NumPy Standard Data Types
• NumPy is built in C, the types will be familiar to users of
C, Fortran, and other related languages.
Data type Description Data type Description
uint8 Unsigned integer (0 to 255)
bool_ Boolean (True or False) stored as a byte uint16 Unsigned integer (0 to 65535)
Default integer type (same as C long; uint32 Unsigned integer (0 to 4294967295)
int_
normally either int64 or int32) Unsigned integer (0 to
uint64
18446744073709551615)
intc Identical to C int (normally int32 or int64) float_ Shorthand for float64
Half-precision float: sign bit, 5 bits exponent,
Integer used for indexing (same as C ssize_t; float16
intp 10 bits mantissa
normally either int32 or int64)
Single-precision float: sign bit, 8 bits
int8 Byte (–128 to 127) float32
exponent, 23 bits mantissa
int16 Integer (–32768 to 32767)
Double-precision float: sign bit, 11 bits
float64
int32 Integer (–2147483648 to 2147483647) exponent, 52 bits mantissa
complex_ Shorthand for complex128
Integer (–9223372036854775808 to Complex number, represented by two 32-bit
Int64 complex64
9223372036854775807) floats
Complex number, represented by two 64-bit
complex128
floats
Python in Data Science 2023-24 35
The Basics of NumPy Arrays
• Attributes of arrays: Determining the size, shape,
memory consumption, and data types of arrays
• Indexing of arrays: Getting and setting the values of
individual array elements
• Slicing of arrays: Getting and setting smaller subarrays
within a larger array
• Reshaping of arrays: Changing the shape of a given array
• Joining and splitting of arrays: Combining multiple
arrays into one, and splitting one array into many
Python in Data Science 2023-24 36
NumPy Array Attributes
defining random arrays of one, two, and three
dimensions
import numpy as np
rng = np.random.default_rng(seed=1701) # seed for
reproducibility
x1 = rng.integers(10, size=6) # one-dimensional array
x2 = rng.integers(10, size=(3, 4)) # two-dimensional array
x3 = rng.integers(10, size=(3, 4, 5)) # three-dimensional
array
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)
print("dtype: ", x3.dtype)
Python in Data Science 2023-24 37
Array Indexing
• array Indexing: access to single elements,
• array slicing: access to subarrays:
x[start:stop:step]
• reshaping of arrays
• array concatenation and splitting
Python in Data Science 2023-24 38
Array Concatenation and Splitting
Concatenation of arrays:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])
Splitting of arrays:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)
Python in Data Science 2023-24 39
Computation on NumPy Arrays:
Universal Functions
Array Arithmetic
Absolute Value
Trigonometric Functions
Specialized Ufuncs (scipy.special)
Advanced Ufunc Features
Specifying Output
Aggregations
Outer Products
Python in Data Science 2023-24 40
Aggregations: min, max, between
Function name NaN-safe version Description
• Summing the Values in np.sum np.nansum Compute sum of elements
Compute product of
an Array np.prod np.nanprod
elements
Compute mean of
• Minimum and Maximum np.mean np.nanmean
elements
Compute standard
np.std np.nanstd
• Multidimensional deviation
np.var np.nanvar Compute variance
Aggregates np.min np.nanmin Find minimum value
• Other Aggregation np.max np.nanmax Find maximum value
Functions: (table) np.argmin np.nanargmin Find index of minimum
value
Find index of maximum
np.argmax np.nanargmax
value
np.median np.nanmedian Compute median of
elements
Compute rank-based
np.percentile np.nanpercentile
statistics of elements
Evaluate whether any
http://localhost:8888/notebooks/Downloads/02.04-Co np.any N/A
elements are true
mputation-on-arrays-aggregates.ipynb
Evaluate whether all
np.all N/A
elements are true
Python in Data Science 2023-24 41
Computation on Arrays: Broadcasting
Broadcasting in NumPy follows a strict set of rules to
determine the interaction between the two arrays:
Rule 1: If the two arrays differ in their number of
dimensions, the shape of the one with fewer dimensions is
padded with ones on its leading (left) side.
Rule 2: If the shape of the two arrays does not match in
any dimension, the array with shape equal to 1 in that
dimension is stretched to match the other shape.
Rule 3: If in any dimension the sizes disagree and neither is
equal to 1, an error is raised.
Python in Data Science 2023-24 42
Comparisons, Masks, and Boolean Logic
Masking comes up when we want to extract, modify, count, or otherwise
manipulate values in an array based on some criterion.
In NumPy, Boolean masking is often the most efficient way to accomplish these
types of tasks.
NumPy implements comparison operators such as < (less than) and > (greater
than) as element-wise ufuncs. The result of these comparison operators is always
an array with a Boolean data type.
All six of the standard comparison operations are available:
Operator Equivalent ufunc Operator Equivalent ufunc
== np.equal != np.not_equal
< np.less <= np.less_equal
> np.greater >= np.greater_equal
Python in Data Science 2023-24 43
Comparisons, Masks, and Boolean Logic
Boolean Operators
Boolean Arrays as Masks
Operator Equivalent ufunc Operator Equivalent ufunc
& np.bitwise_and | np.bitwise_or
^ np.bitwise_xor ~ np.bitwise_not
Python in Data Science 2023-24 44
Fancy Indexing
Fancy indexing is conceptually simple: it means passing an array of indices to
access multiple array elements at once. For example, consider the following array:
import numpy as np
rng = np.random.default_rng(seed=1701)
x = rng.integers(100, size=10)
print(x)
[x[3], x[7], x[2]] # simplest fancy indexing
ind = np.array([[3, 7],
[4, 5]])
x[ind]
Combined Indexing
X[2, [2, 0, 1]]
X[1:, [2, 0, 1]]
Python in Data Science 2023-24 45
Sorting Arrays
Fast Sorting in NumPy: np.sort and np.argsort
Sorting Along Rows or Columns
Partial Sorts: Partitioning
L = [3, 1, 4, 1, 5, 9, 2, 6] sorted('python')
sorted(L) # returns a sorted copy
L.sort() # acts in-place and
returns None
print(L)
import numpy as np
x = np.array([2, 1, 4, 3, 5])
np.sort(x)
Python in Data Science 2023-24 46
NumPy's Structured Arrays
Exploring Structured Array Creation
More Advanced Compound Types
Character Description Example
'b' Byte np.dtype('b')
'i' Signed integer np.dtype('i4') == np.int32
'u' Unsigned integer np.dtype('u1') == np.uint8
'f' Floating point np.dtype('f8') == np.int64
np.dtype('c16') ==
'c' Complex floating point np.complex128
'S', 'a' String np.dtype('S5')
'U' Unicode string np.dtype('U') == np.str_
'V' Raw data (void) np.dtype('V') == np.void
Python in Data Science 2023-24 47
Basic features of Pandas library
Pandas objects
NumPy and Pandas imports:
import numpy as np
import pandas as pd
The Pandas Series Object
Series as Generalized NumPy Array
Series as Specialized Dictionary
Constructing Series Objects
Python in Data Science 2023-24 48
The Pandas DataFrame Object
DataFrame as Generalized NumPy Array
DataFrame as Specialized Dictionary
Constructing DataFrame Objects
From a single Series object
pd.DataFrame(population, columns=['population'])
From a list of dicts
data = [{'a': i, 'b': 2 * i}
for i in range(3)]
pd.DataFrame(data)
From a dictionary of Series objects
pd.DataFrame({'population': population,
'area': area})
From a two-dimensional NumPy array
pd.DataFrame(np.random.rand(3, 2),
columns=['foo', 'bar'],
index=['a', 'b', 'c'])
From a NumPy structured array
A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])
pd.DataFrame(A)
Python in Data Science 2023-24 49
The Pandas Index Object
• Index as Immutable Array
• Index as Ordered Set
Python in Data Science 2023-24 50
Data Indexing and Selection
Data Selection in Series
Series as Dictionary
Series as One-Dimensional Array
Indexers: loc and iloc
Data Selection in DataFrames
DataFrame as Dictionary
DataFrame as Two-Dimensional Array
Additional Indexing Conventions
Python in Data Science 2023-24 51
Operating on Data in Pandas
Index Preservation (Ufuncs)
Index Alignment
Index Alignment in Series
Index Alignment in DataFrames
Operations Between DataFrames and Series
Python in Data Science 2023-24 52
Handling Missing Data
None as a Sentinel Value
NaN: Missing Numerical Data
NaN and None in Pandas
Pandas Nullable Dtypes
Operating on Null Values
Detecting Null Values
Dropping Null Values
Filling Null Values
Python in Data Science 2023-24 53
Hierarchical Indexing
A Multiply Indexed Series
The Bad Way
The Better Way: The Pandas MultiIndex
MultiIndex as Extra Dimension
Methods of MultiIndex Creation
Explicit MultiIndex Constructors
MultiIndex Level Names
MultiIndex for Columns
Indexing and Slicing a MultiIndex
Multiply Indexed Series
Multiply Indexed DataFrames
Rearranging Multi-Indexes
Sorted and Unsorted Indices
Stacking and Unstacking Indices
Index Setting and Resetting
Python in Data Science 2023-24 54
Combining Datasets: concat and append
Recall: Concatenation of NumPy Arrays
Simple Concatenation with pd.concat
Duplicate Indices
Concatenation with Joins
The append Method
Python in Data Science 2023-24 55