0% found this document useful (0 votes)

2 views55 pages

Python - Data Science Lecture 1

The document outlines a lecture on Python for Data Science at WSB University, covering topics such as Python basics, data analysis, and data mining. It includes information on course materials, assessment criteria, and a bibliography of recommended readings. The lecture will also introduce key libraries like Pandas and NumPy, and discuss data processing and visualization techniques.

Uploaded by

lutvaliyev.r

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views55 pages

Python - Data Science Lecture 1

Uploaded by

lutvaliyev.r

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 55

Python - Data Science

Python language in data processing and

data mining
Lecture 1.
WSB University
Dariusz Badura

Python in Data Science 2023-24 1

Information about Lecture materials
presented on On-lineWSB platform
 On-lineWSB platform
 Link: https://online.wsb.edu.pl/course/view.php?id=10797
 Password: PLPBZima23

The course discusses the basics of:

 Python and libraries,
 data analisis,
 data science.

Python in Data Science 2023-24 2

The criteria for passing the lecture
• At the end of the semester there will be a test
on the topics presented in the lecture: 20 – 30
of test questions.

• Four tests and tasks per semester on lecture

issues.

• Materials of lecture presenting will be posted

on the e-learning platform On-line WSB.

Python in Data Science 2023-24 3

Bibliography
 Wes McKinney: Python for Data Analysis: Data
Wrangling with pandas, NumPy, and Jupyter, 3rd
Edition; O’Reilly Media, Inc. © 2022.
 Sarah Guido, Andreas Müller: Introduction to Machine
Learning with Python: A Guide for Data Scientists;
O’Reilly Media, Inc. © 2017.
 Sandy Ryza, Uri Laserson, Sean Owen, & Josh Wills:
Advanced Analytics with Spark; by O’Reilly Media, Inc.
June 2017;
 Others Internet sources …
Python in Data Science 2023-24 4
Data processing
• …, manipulation of data by a computer. It includes the conversion of
raw data to machine-readable form, flow of data through the CPU and
memory to output devices, and formatting or transformation of
output.
• … the collection and manipulation of digital data to produce
meaningful information. Data processing is a form of information
processing, which is the modification of information in any manner
detectable by an observer.
• The term "Data Processing", has also been used to refer to a
department within an organization responsible for the operation of
data processing programs.
Python in Data Science 2023-24 5
Data processing functions
Data processing may involve various processes, including:
• Validation – Ensuring that supplied data is correct and relevant.
• Sorting – "arranging items in some sequence and/or in different
sets."
• Summarization(statistical) or (automatic) – reducing detailed data
to its main points.
• Aggregation – combining multiple pieces of data.
• Analysis – the "collection, organization, analysis, interpretation and
presentation of data."
• Reporting – list detail or summary data or computed information.
• Classification – separation of data into various categories.

Python in Data Science 2023-24 6

Data mining
• Data mining is the process of sorting through large data sets to identify
patterns and relationships that can help solve business problems through data
analysis.
• Data mining is the process of extracting and discovering patterns in large data
sets involving methods at the intersection of machine learning, statistics,
and database systems. Data mining is an interdisciplinary subfield of computer
science and statistics with an overall goal of extracting information and
transforming the information into a comprehensible structure for further use.
• Data mining is the analysis step of the "knowledge discovery in databases"
process, or KDD. It also involves database and data management aspects, data
pre-processing, model and inference considerations, interestingness
metrics, complexity considerations, post-processing of discovered
structures, visualization, and online updating.

Python in Data Science 2023-24 7

Lecture issues
 Data exploration & analysis.
– Pandas; NumPy; SciPy. (and others)
 Data visualization.
– Matplotlib; Seaborn; Datashader; others.
 Classical machine learning.
– Scikit-Learn, StatsModels.
 Deep learning.
– Keras, TensorFlow, and a whole host of others.
 Data storage and big data frameworks.
– Apache Spark; Apache Hadoop; HDFS; Dask; h5py/pytables.
 Odds and ends.
– nltk; Spacy; OpenCV/cv2; scikit-image; Cython.
Python in Data Science 2023-24 8
Plan of the first lecture

 A quick overview of the features of Python,

 NumPy library overview,
 Pandas library overview.

Python in Data Science 2023-24 9

Python
BASICS OF THE LANGUAGE

Python in Data Science 2023-24 10

Why Python?
 … A general programming language, thanks to
the libraries pandas, NumPy, scipy, matplotlib,
TensorFlow ... it has become a powerful
environment for scientific calculations.
 The basic features of the language:
 Basic data types
 Containers
 Functions
 classes.
https://docs.python.org/3/tutorial/index.html

Python in Data Science 2023-24 11

The Interpreter and Its Environment
 Anaconda:
 Spider
 Jupyter
 InPy
 Colab

Other development environments:

 PyDev (free) – an integrated development environment built on the Eclipse platform;
 PyCharm by JetBrains;
 Python Tools for Visual Studio for Windows users;
 Spyder (free) – an integrated development environment included with the Anaconda
interpreter;
 Komod - commercial integrated development environment.

Python in Data Science 2023-24 12

Python versions
 There are currently two different supported
versions of Python, 2.7 and 3.7.
 Python 3.0 introduced many backwards-
incompatible changes to the language, so code
written for 2.7 may not work in 3.7 and vice
versa.
 the code presented will use Python 3.5.
 You can check the Python version on the
command line by running python --version.
Python in Data Science 2023-24 13
Example: Python implementation of the classic
„quicksort” algorithm:

def quicksort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quicksort(left) + middle +
quicksort(right)

print(quicksort([3,6,8,10,1,2,1]))
# Prints "[1, 1, 2, 3, 6, 8, 10]"
Python in Data Science 2023-24 14
Basic data types
 Numbers: Integers and floating point numbers
work as they do in other languages:

x = 3
print(type(x)) # Prints "<class 'int'>"
print(x) # Prints “5"
print(x + 1) # Addition; prints “6"
print(x - 1) # Subtraction; prints “4"
print(x * 2) # Multiplication; prints “10"
print(x ** 2) # Exponentiation; prints “25"
x += 1
print(x) # Prints “6"
x *= 2
print(x) # Prints “10"
y = 3.7
print(type(y)) # Prints "<class 'float'>"
print(y, y + 1, y * 2, y ** 2) # Prints “3.7 4.7 7.4 13.69"

Python in Data Science 2023-24 15

Basic data types
Python does not have unary increment (x++) or decrement (x--) operators.
Python has built-in complex number types.
 Booleans: Python implements all the usual boolean logic operators, but
uses English words instead of symbols (&&, ||, etc.):

t = True
f = False
print(type(t)) # Prints "<class 'bool'>"
print(t and f) # Logical AND; prints "False"
print(t or f) # Logical OR; prints "True"
print(not t) # Logical NOT; prints "False"
print(t != f) # Logical XOR; prints "True"

Python in Data Science 2023-24 16

Basic data types
 Strings: Python has strong support for strings:

hello = 'hello' # String literals can use single quotes

world = "world" # or double quotes; it does not matter.
print(hello) # Prints "hello"
print(len(hello)) # String length; prints "5"
hw = hello + ' ' + world # String concatenation
print(hw) # prints "hello world"
hw12 = '%s %s %d' % (hello, world, 12) # sprintf style string formatting
print(hw12) # prints "hello world 12"
Useful methods
s = "hello"
print(s.capitalize()) # Capitalize a string; prints "Hello"
print(s.upper()) # Convert a string to uppercase; prints "HELLO"
print(s.rjust(7)) # Right-justify a string, padding with spaces; prints "
hello"
print(s.center(7)) # Center a string, padding with spaces; prints " hello "
print(s.replace('l', '(ell)')) # Replace all instances of one substring with
another;
# prints "he(ell)(ell)o"
print(' world '.strip()) # Strip leading and trailing whitespace; prints
"world"
Python in Data Science 2023-24 17
Containers
Python comes with several built-in container types: lists,
dictionaries, sets, and tuples.
 Lists

A list is equivalent to an array in Python, but it is resizeable and

can contain elements of different types:
xs = [3, 1, 2] # Create a list
print(xs, xs[2]) # Prints "[3, 1, 2] 2"
print(xs[-1]) # Negative indices count from the end of the list; prints "2"
xs[2] = 'foo' # Lists can contain elements of different types
print(xs) # Prints "[3, 1, 'foo']"
xs.append('bar') # Add a new element to the end of the list
print(xs) # Prints "[3, 1, 'foo', 'bar']"
x = xs.pop() # Remove and return the last element of the list
print(x, xs) # Prints "bar [3, 1, 'foo']"

Python in Data Science 2023-24 18

Containers -> lists
 Slicing: In addition to accessing list elements individually, Python provides a
concise syntax to access sublists; this is known as slicing:

nums = list(range(5)) # range is a built-in function that

creates a list of integers
print(nums) # Prints "[0, 1, 2, 3, 4]"
print(nums[2:4]) # Get a slice from index 2 to 4
(exclusive); prints "[2, 3]"
print(nums[2:]) # Get a slice from index 2 to the end;
prints "[2, 3, 4]"
print(nums[:2]) # Get a slice from the start to index 2
(exclusive); prints "[0, 1]"
print(nums[:]) # Get a slice of the whole list; prints
"[0, 1, 2, 3, 4]"
print(nums[:-1]) # Slice indices can be negative; prints
"[0, 1, 2, 3]"
nums[2:4] = [8, 9] # Assign a new sublist to a slice
print(nums) # Prints "[0, 1, 8, 9, 4]"

Python in Data Science 2023-24 19

Containers -> lists -> Loops
 List items can be enclosed in a loop

animals = ['cat', 'dog', 'monkey']

for animal in animals:
print(animal)

 To access the index of each element in the loop body you can
access it by using a built-in function enumerate:

animals = ['cat', 'dog', 'monkey']

for idx, animal in enumerate(animals):
print('#%d: %s' % (idx + 1, animal))
# Prints "#1: cat", "#2: dog", "#3:
monkey", each on its own line

Python in Data Science 2023-24 20

Containers -> lists -> List comprehension
 When programming, we often want to transform one type of data into
another. As a simple example, consider the following code that calculates
square numbers:
nums = [0, 1, 2, 3, 4]
squares = []
for x in nums:
squares.append(x ** 2)
print(squares) # Prints [0, 1, 4, 9, 16]
We can simplify this code:
nums = [0, 1, 2, 3, 4]
squares = [x ** 2 for x in nums]
print(squares) # Prints [0, 1, 4, 9, 16]

List words may also contain the conditions:

nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
print(even_squares) # Prints "[0, 4, 16]"

Python in Data Science 2023-24 21

Containers -> dictionaries
 A dictionary stores (key, value) pairs, similar to a map in Java or
an object in JavaScript. We can use it this way:

d = {'cat': 'cute', 'dog': 'furry'} # Create a new dictionary

with some data
print(d['cat']) # Get an entry from a dictionary; prints
"cute"
print('cat' in d) # Check if a dictionary has a given key;
prints "True"
d['fish'] = 'wet' # Set an entry in a dictionary
print(d['fish']) # Prints "wet"
# print(d['monkey']) # KeyError: 'monkey' not a key of d
print(d.get('monkey', 'N/A')) # Get an element with a default;
prints "N/A"
print(d.get('fish', 'N/A')) # Get an element with a default;
prints "wet"
del d['fish'] # Remove an element from a dictionary
print(d.get('fish', 'N/A')) # "fish" is no longer a key; prints
"N/A"

Python in Data Science 2023-24 22

Containers -> dictionaries-> Loops
 Iterating through a dictionary by key:
d = {'person': 2, 'cat': 4, 'spider': 8}
for animal in d:
legs = d[animal]
print('A %s has %d legs' % (animal, legs))
# Prints "A person has 2 legs", "A cat has 4 legs", "A spider has
8 legs"

 Access the keys and their corresponding values using the items:

d = {'person': 2, 'cat': 4, 'spider': 8}

for animal, legs in d.items():
print('A %s has %d legs' % (animal, legs))
# Prints "A person has 2 legs", "A cat has 4 legs", "A spider has
8 legs"

Python in Data Science 2023-24 23

Containers -> dictionaries comprehensions

 Similar to list descriptions, but allow you to easily create

dictionaries. For example:

nums = [0, 1, 2, 3, 4]
even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0}
print(even_num_to_square) # Prints "{0: 0, 2: 4, 4: 16}"

Python in Data Science 2023-24 24

Sets
 A set is an unordered collection of distinct elements.
Example:

animals = {'cat', 'dog'}

print('cat' in animals) # Check if an element is in a set;
prints "True"
print('fish' in animals) # prints "False"
animals.add('fish') # Add an element to a set
print('fish' in animals) # Prints "True"
print(len(animals)) # Number of elements in a set; prints
"3"
animals.add('cat') # Adding an element that is already in
the set does nothing
print(len(animals)) # Prints "3"
animals.remove('cat') # Remove an element from a set
print(len(animals)) # Prints "2"

Python in Data Science 2023-24 25

Sets
 Loops: Iterating over a set has the same syntax as iterating over a list;
however, because sets are unordered, assumptions cannot be made about
the order in which the set's elements are visited:
animals = {'cat', 'dog', 'fish'}
for idx, animal in enumerate(animals):
print('#%d: %s' % (idx + 1, animal))
# Prints "#1: fish", "#2: dog", "#3: cat"

 Set comprehensions: Like lists and dictionaries, we can easily construct sets using
the comprehensions set:

from math import sqrt

nums = {int(sqrt(x)) for x in range(30)}
print(nums) # Prints "{0, 1, 2, 3, 4, 5}"

Python in Data Science 2023-24 26

Tuples
 A tuple is an (immutable) ordered list of values. A tuple is similar
to a list in many ways; one difference is that tuples can be used as
keys in dictionaries and as elements of sets, while lists cannot.
Example:

d = {(x, x + 1): x for x in range(10)} # Create a

dictionary with tuple keys
t = (5, 6) # Create a tuple
print(type(t)) # Prints "<class 'tuple'>"
print(d[t]) # Prints "5"
print(d[(1, 2)]) # Prints "1"

Python in Data Science 2023-24 27

Functions
 Python functions are defined using the def keyword. Example:

def sign(x):
if x > 0:
return 'positive'
elif x < 0:
return 'negative'
else:
return 'zero'

for x in [-1, 0, 1]:

print(sign(x))
# Prints "negative", "zero", "positive"

Python in Data Science 2023-24 28

Functions
 We often define functions to take optional keyword arguments, like
this:

def hello(name, loud=False):

if loud:
print('HELLO, %s!' % name.upper())
else:
print('Hello, %s' % name)

hello('Bob') # Prints "Hello, Bob"

hello('Fred', loud=True) # Prints "HELLO, FRED!"

Python in Data Science 2023-24 29

Classes
 Example

class Greeter(object):

# Constructor
def __init__(self, name):
self.name = name # Create an instance variable

# Instance method
def greet(self, loud=False):
if loud:
print('HELLO, %s!' % self.name.upper())
else:
print('Hello, %s' % self.name)

g = Greeter('Fred') # Construct an instance of the Greeter class

g.greet() # Call an instance method; prints "Hello,
Fred"
g.greet(loud=True) # Call an instance method; prints "HELLO,
FRED!"

Python in Data Science 2023-24 30

NumPy library
• The topic of NumPy and pandas libraries refers to datasets can come from a wide range of
sources and in a wide range of formats, including:
– collections of documents,
– collections of images,
– collections of sound clips,
– collections of numerical measurements, or
– … nearly anything technical issues.
• Data sets can be represented as arrays of numbers.
– digital images—can be thought of as simply two-dimensional arrays of numbers representing pixel
brightness across the area.
– sound clips can be thought of as one-dimensional arrays of intensity versus time.
– text can be converted in various ways into numerical representations, such as binary digits
representing the frequency of certain words or pairs of words.
• Efficient storage and manipulation of numerical arrays is fundamental to the process of
doing data science.
• Install NumPy : http://www.numpy.org/
• The import NumPy and for example double-check the version:
Python in Data Science 2023-24 31
NumPy module
• The Numpy module is a basic library for
scientific calculations in Python (including
matrix multiplication and addition,
diagonalization or inversion, integration,
solving equations, etc.).
• It provides us with specialized data types,
operations and functions that are not available
in a typical Python installation.

Python in Data Science 2023-24 32

Creating Arrays from Python Lists
• Unlike Python lists, NumPy arrays can only contain data
of the same type. If the types do not match, NumPy will
upcast them according to its type promotion rules;
here, integers are upcast to floating point:
• Integer array: np.array([1, 4, 2, 5, 3])

• Float point array: np.array([3.14, 4, 2, 3])

• Use dtype keyword:

np.array([1, 2, 3, 4], dtype=np.float32)

Python in Data Science 2023-24 33

Creating Arrays from Scratch
For larger arrays – more efficient to create arrays from scratch
using routines built into NumPy
• np.zeros(10, dtype=int)
• np.ones((3, 5), dtype=float)
• np.full((3, 5), 3.14)

• # Create an array filled with a linear sequence # starting at 0,

ending at 20, stepping by 2 # (this is similar to the built-in range
function) np.arange(0, 20, 2)
• # Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

• … and others
Python in Data Science 2023-24 34
NumPy Standard Data Types
• NumPy is built in C, the types will be familiar to users of
C, Fortran, and other related languages.
Data type Description Data type Description
uint8 Unsigned integer (0 to 255)
bool_ Boolean (True or False) stored as a byte uint16 Unsigned integer (0 to 65535)

Default integer type (same as C long; uint32 Unsigned integer (0 to 4294967295)

int_
normally either int64 or int32) Unsigned integer (0 to
uint64
18446744073709551615)
intc Identical to C int (normally int32 or int64) float_ Shorthand for float64
Half-precision float: sign bit, 5 bits exponent,
Integer used for indexing (same as C ssize_t; float16
intp 10 bits mantissa
normally either int32 or int64)
Single-precision float: sign bit, 8 bits
int8 Byte (–128 to 127) float32
exponent, 23 bits mantissa
int16 Integer (–32768 to 32767)
Double-precision float: sign bit, 11 bits
float64
int32 Integer (–2147483648 to 2147483647) exponent, 52 bits mantissa
complex_ Shorthand for complex128
Integer (–9223372036854775808 to Complex number, represented by two 32-bit
Int64 complex64
9223372036854775807) floats
Complex number, represented by two 64-bit
complex128
floats

Python in Data Science 2023-24 35

The Basics of NumPy Arrays
• Attributes of arrays: Determining the size, shape,
memory consumption, and data types of arrays
• Indexing of arrays: Getting and setting the values of
individual array elements
• Slicing of arrays: Getting and setting smaller subarrays
within a larger array
• Reshaping of arrays: Changing the shape of a given array
• Joining and splitting of arrays: Combining multiple
arrays into one, and splitting one array into many
Python in Data Science 2023-24 36
NumPy Array Attributes
defining random arrays of one, two, and three
dimensions
import numpy as np
rng = np.random.default_rng(seed=1701) # seed for
reproducibility

x1 = rng.integers(10, size=6) # one-dimensional array

x2 = rng.integers(10, size=(3, 4)) # two-dimensional array
x3 = rng.integers(10, size=(3, 4, 5)) # three-dimensional
array

print("x3 ndim: ", x3.ndim)

print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)
print("dtype: ", x3.dtype)

Python in Data Science 2023-24 37

Array Indexing
• array Indexing: access to single elements,
• array slicing: access to subarrays:
x[start:stop:step]
• reshaping of arrays
• array concatenation and splitting

Python in Data Science 2023-24 38

Array Concatenation and Splitting
 Concatenation of arrays:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

 Splitting of arrays:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

Python in Data Science 2023-24 39

Computation on NumPy Arrays:
Universal Functions
 Array Arithmetic
 Absolute Value
 Trigonometric Functions
 Specialized Ufuncs (scipy.special)
 Advanced Ufunc Features
 Specifying Output
 Aggregations
 Outer Products
Python in Data Science 2023-24 40
Aggregations: min, max, between
Function name NaN-safe version Description

• Summing the Values in np.sum np.nansum Compute sum of elements

Compute product of
an Array np.prod np.nanprod
elements
Compute mean of
• Minimum and Maximum np.mean np.nanmean
elements
Compute standard
np.std np.nanstd
• Multidimensional deviation
np.var np.nanvar Compute variance
Aggregates np.min np.nanmin Find minimum value
• Other Aggregation np.max np.nanmax Find maximum value
Functions:  (table) np.argmin np.nanargmin Find index of minimum
value
Find index of maximum
np.argmax np.nanargmax
value
np.median np.nanmedian Compute median of
elements
Compute rank-based
np.percentile np.nanpercentile
statistics of elements

Evaluate whether any

http://localhost:8888/notebooks/Downloads/02.04-Co np.any N/A
elements are true
mputation-on-arrays-aggregates.ipynb
Evaluate whether all
np.all N/A
elements are true

Python in Data Science 2023-24 41

Computation on Arrays: Broadcasting
Broadcasting in NumPy follows a strict set of rules to
determine the interaction between the two arrays:
 Rule 1: If the two arrays differ in their number of
dimensions, the shape of the one with fewer dimensions is
padded with ones on its leading (left) side.
 Rule 2: If the shape of the two arrays does not match in
any dimension, the array with shape equal to 1 in that
dimension is stretched to match the other shape.
 Rule 3: If in any dimension the sizes disagree and neither is
equal to 1, an error is raised.
Python in Data Science 2023-24 42
Comparisons, Masks, and Boolean Logic
 Masking comes up when we want to extract, modify, count, or otherwise
manipulate values in an array based on some criterion.
 In NumPy, Boolean masking is often the most efficient way to accomplish these
types of tasks.
 NumPy implements comparison operators such as < (less than) and > (greater
than) as element-wise ufuncs. The result of these comparison operators is always
an array with a Boolean data type.
 All six of the standard comparison operations are available:
Operator Equivalent ufunc Operator Equivalent ufunc
== np.equal != np.not_equal
< np.less <= np.less_equal
> np.greater >= np.greater_equal

Python in Data Science 2023-24 43

Comparisons, Masks, and Boolean Logic

 Boolean Operators
 Boolean Arrays as Masks

Operator Equivalent ufunc Operator Equivalent ufunc

& np.bitwise_and | np.bitwise_or

^ np.bitwise_xor ~ np.bitwise_not

Python in Data Science 2023-24 44

Fancy Indexing
 Fancy indexing is conceptually simple: it means passing an array of indices to
access multiple array elements at once. For example, consider the following array:

import numpy as np
rng = np.random.default_rng(seed=1701)

x = rng.integers(100, size=10)
print(x)
[x[3], x[7], x[2]] # simplest fancy indexing
ind = np.array([[3, 7],
[4, 5]])
x[ind]
 Combined Indexing
 X[2, [2, 0, 1]]
 X[1:, [2, 0, 1]]
Python in Data Science 2023-24 45
Sorting Arrays
 Fast Sorting in NumPy: np.sort and np.argsort
 Sorting Along Rows or Columns

 Partial Sorts: Partitioning

L = [3, 1, 4, 1, 5, 9, 2, 6] sorted('python')
sorted(L) # returns a sorted copy

L.sort() # acts in-place and

returns None
print(L)

import numpy as np

x = np.array([2, 1, 4, 3, 5])
np.sort(x)

Python in Data Science 2023-24 46

NumPy's Structured Arrays
 Exploring Structured Array Creation
 More Advanced Compound Types
Character Description Example
'b' Byte np.dtype('b')
'i' Signed integer np.dtype('i4') == np.int32

'u' Unsigned integer np.dtype('u1') == np.uint8

'f' Floating point np.dtype('f8') == np.int64

np.dtype('c16') ==
'c' Complex floating point np.complex128
'S', 'a' String np.dtype('S5')
'U' Unicode string np.dtype('U') == np.str_

'V' Raw data (void) np.dtype('V') == np.void

Python in Data Science 2023-24 47

Basic features of Pandas library
 Pandas objects
 NumPy and Pandas imports:
import numpy as np
import pandas as pd
 The Pandas Series Object
 Series as Generalized NumPy Array
 Series as Specialized Dictionary

 Constructing Series Objects

Python in Data Science 2023-24 48

The Pandas DataFrame Object
 DataFrame as Generalized NumPy Array
 DataFrame as Specialized Dictionary
 Constructing DataFrame Objects
 From a single Series object
pd.DataFrame(population, columns=['population'])
 From a list of dicts
data = [{'a': i, 'b': 2 * i}
for i in range(3)]
pd.DataFrame(data)
 From a dictionary of Series objects
pd.DataFrame({'population': population,
'area': area})
 From a two-dimensional NumPy array
pd.DataFrame(np.random.rand(3, 2),
columns=['foo', 'bar'],
index=['a', 'b', 'c'])
 From a NumPy structured array
A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])
pd.DataFrame(A)

Python in Data Science 2023-24 49

The Pandas Index Object
• Index as Immutable Array
• Index as Ordered Set

Python in Data Science 2023-24 50

Data Indexing and Selection
 Data Selection in Series
 Series as Dictionary
 Series as One-Dimensional Array
 Indexers: loc and iloc
 Data Selection in DataFrames
 DataFrame as Dictionary
 DataFrame as Two-Dimensional Array
 Additional Indexing Conventions
Python in Data Science 2023-24 51
Operating on Data in Pandas
 Index Preservation (Ufuncs)
 Index Alignment
 Index Alignment in Series
 Index Alignment in DataFrames
 Operations Between DataFrames and Series

Python in Data Science 2023-24 52

Handling Missing Data
 None as a Sentinel Value
 NaN: Missing Numerical Data
 NaN and None in Pandas
 Pandas Nullable Dtypes
 Operating on Null Values
 Detecting Null Values
 Dropping Null Values
 Filling Null Values
Python in Data Science 2023-24 53
Hierarchical Indexing
 A Multiply Indexed Series
 The Bad Way
 The Better Way: The Pandas MultiIndex
 MultiIndex as Extra Dimension
 Methods of MultiIndex Creation
 Explicit MultiIndex Constructors
 MultiIndex Level Names
 MultiIndex for Columns
 Indexing and Slicing a MultiIndex
 Multiply Indexed Series
 Multiply Indexed DataFrames
 Rearranging Multi-Indexes
 Sorted and Unsorted Indices
 Stacking and Unstacking Indices
 Index Setting and Resetting
Python in Data Science 2023-24 54
Combining Datasets: concat and append
 Recall: Concatenation of NumPy Arrays
 Simple Concatenation with pd.concat
 Duplicate Indices
 Concatenation with Joins
 The append Method

Python in Data Science 2023-24 55

My Book of Python Computing - Abhijit Kar Gupta
50% (2)
My Book of Python Computing - Abhijit Kar Gupta
385 pages
Ultimate Step by Step Guide To Machine Learning Using Python Predictive
100% (3)
Ultimate Step by Step Guide To Machine Learning Using Python Predictive
56 pages
Data Science Python
No ratings yet
Data Science Python
42 pages
Python Libraries Seminar Report
100% (2)
Python Libraries Seminar Report
16 pages
深度学习 Numpy 基础
No ratings yet
深度学习 Numpy 基础
23 pages
Data Science Workshop - Day 1
No ratings yet
Data Science Workshop - Day 1
80 pages
Module 1 Materials
No ratings yet
Module 1 Materials
131 pages
Python Self Study Material
0% (1)
Python Self Study Material
9 pages
FIT1043 - Lecture 2 - 2024 Slides
No ratings yet
FIT1043 - Lecture 2 - 2024 Slides
55 pages
Python and Libraries for AI
No ratings yet
Python and Libraries for AI
34 pages
Python
No ratings yet
Python
61 pages
Python Development
No ratings yet
Python Development
22 pages
Python For Data Science Extended Ebook PDF
100% (5)
Python For Data Science Extended Ebook PDF
56 pages
Dsbda Ass1
No ratings yet
Dsbda Ass1
61 pages
NumPy, SciPy and MatPlotLib
100% (1)
NumPy, SciPy and MatPlotLib
18 pages
Lecture 01
No ratings yet
Lecture 01
69 pages
4-Week Data Science Internship Report
No ratings yet
4-Week Data Science Internship Report
29 pages
Play With Python - An Intro To Data Science
No ratings yet
Play With Python - An Intro To Data Science
64 pages
Data Analysis Python Read The Docs Io en Latest
No ratings yet
Data Analysis Python Read The Docs Io en Latest
79 pages
People Analytics Python Training String
No ratings yet
People Analytics Python Training String
19 pages
Lab Manual - ML - RIT
No ratings yet
Lab Manual - ML - RIT
54 pages
Data Science Online Training Course Content 1626830873
No ratings yet
Data Science Online Training Course Content 1626830873
26 pages
cmsc320 f2018 Lec02
No ratings yet
cmsc320 f2018 Lec02
45 pages
AML LAB MANUAL Yash
No ratings yet
AML LAB MANUAL Yash
60 pages
Python Numpy Tutorial
No ratings yet
Python Numpy Tutorial
22 pages
SENG419-python 98745
No ratings yet
SENG419-python 98745
103 pages
Class Notes Till 5th March
No ratings yet
Class Notes Till 5th March
27 pages
IT Lab PPT Pratham Chouhan CSE174
No ratings yet
IT Lab PPT Pratham Chouhan CSE174
40 pages
Data Processing With Python and R
No ratings yet
Data Processing With Python and R
6 pages
Python Foundations and Tooling
No ratings yet
Python Foundations and Tooling
42 pages
Python Review
No ratings yet
Python Review
50 pages
oG1M8adGXOGe DHBiQVrXgXHO6GrHU01tHWZgd tpRqUW65xGX9ufzrZMtM6hjBWlvlYViPn6r2Cgghq2M8oiXNNdf0HeL-DQvJKWM
No ratings yet
oG1M8adGXOGe DHBiQVrXgXHO6GrHU01tHWZgd tpRqUW65xGX9ufzrZMtM6hjBWlvlYViPn6r2Cgghq2M8oiXNNdf0HeL-DQvJKWM
42 pages
Python For DataScience
No ratings yet
Python For DataScience
47 pages
Internship
No ratings yet
Internship
31 pages
Python: BY Kannan Moudgalya
No ratings yet
Python: BY Kannan Moudgalya
21 pages
Intership Body
No ratings yet
Intership Body
31 pages
Project Report Industrial Training: DAV Institute of Engineering & Technology, Jalandhar
No ratings yet
Project Report Industrial Training: DAV Institute of Engineering & Technology, Jalandhar
24 pages
FDS Syllabus and CIS
No ratings yet
FDS Syllabus and CIS
10 pages
Anshika Summer Training
No ratings yet
Anshika Summer Training
11 pages
MachineLearningNotes PDF
100% (1)
MachineLearningNotes PDF
299 pages
Python Numpy-Github - Io
No ratings yet
Python Numpy-Github - Io
25 pages
F32 & Front SAM Relays Easy View W212.065 RHD
No ratings yet
F32 & Front SAM Relays Easy View W212.065 RHD
15 pages
A Report Submitted in Partial Fulfillment of The Requirement of The Award of Degree of
No ratings yet
A Report Submitted in Partial Fulfillment of The Requirement of The Award of Degree of
35 pages
Python
No ratings yet
Python
10 pages
Python Unit-1
No ratings yet
Python Unit-1
30 pages
Manoj 5th Sem Project Report
No ratings yet
Manoj 5th Sem Project Report
20 pages
Python Theory
No ratings yet
Python Theory
22 pages
T - Report Abhishek Choudary
No ratings yet
T - Report Abhishek Choudary
17 pages
CPT211 - Lecture Note 2-1
No ratings yet
CPT211 - Lecture Note 2-1
4 pages
Data Science & ML Using Python
No ratings yet
Data Science & ML Using Python
5 pages
In Python
No ratings yet
In Python
5 pages
DS ML Python
No ratings yet
DS ML Python
4 pages
Python Swapnil s1
No ratings yet
Python Swapnil s1
9 pages
Python Notes Unit 1 and 2
No ratings yet
Python Notes Unit 1 and 2
11 pages
Introduction To Python 1
No ratings yet
Introduction To Python 1
13 pages
Workshop Notes-1 Introduction To Python
No ratings yet
Workshop Notes-1 Introduction To Python
8 pages
Unit 3 Theories and Principles in The Use and Design of Technology Driven Learning Lessons
100% (1)
Unit 3 Theories and Principles in The Use and Design of Technology Driven Learning Lessons
49 pages
Microbiology Practice Test I
100% (4)
Microbiology Practice Test I
13 pages
Grade 9 - English All Unit 3 and Moments #3
No ratings yet
Grade 9 - English All Unit 3 and Moments #3
5 pages
Python, Data Analysis, Data Visualization, Machine Learning, Python With Data Science
No ratings yet
Python, Data Analysis, Data Visualization, Machine Learning, Python With Data Science
11 pages
Mitosis Lecture PDF
No ratings yet
Mitosis Lecture PDF
11 pages
Easter Bunny Ears?: Published by BS Central
No ratings yet
Easter Bunny Ears?: Published by BS Central
10 pages
Sec Registration of Representative Office: Basic Requirements To Have
No ratings yet
Sec Registration of Representative Office: Basic Requirements To Have
8 pages
Accounting For Financial Liabilities
100% (1)
Accounting For Financial Liabilities
71 pages
LENTIL & Legumes
No ratings yet
LENTIL & Legumes
15 pages
Example J.6 Base Plate Bearing On Concrete: Merican Nstitute of Teel Onstruction
100% (1)
Example J.6 Base Plate Bearing On Concrete: Merican Nstitute of Teel Onstruction
4 pages
MBCH772D-Customer Relationship Management-Jan19-Assignment1
100% (2)
MBCH772D-Customer Relationship Management-Jan19-Assignment1
28 pages
U.S. Foreign Assistance To Somalia - Phoenix From The Ashes
No ratings yet
U.S. Foreign Assistance To Somalia - Phoenix From The Ashes
26 pages
Advanced Structural Analysis Prof. Devdas Menon Department of Civil Engineering Indian Institute of Technology, Madras
100% (1)
Advanced Structural Analysis Prof. Devdas Menon Department of Civil Engineering Indian Institute of Technology, Madras
32 pages
Basic Technology Exam Questions For Jss2 Second Term
No ratings yet
Basic Technology Exam Questions For Jss2 Second Term
6 pages
Changing Levels of Meaning and Experience - Steve Andreas
No ratings yet
Changing Levels of Meaning and Experience - Steve Andreas
5 pages
Design & Implement Trash Rack Cleaning System
No ratings yet
Design & Implement Trash Rack Cleaning System
23 pages
Continuous
No ratings yet
Continuous
13 pages
MCQ ICAI Cost
No ratings yet
MCQ ICAI Cost
11 pages
Python and Data Science Syllabus
No ratings yet
Python and Data Science Syllabus
8 pages
RF Heating: Created in COMSOL Multiphysics 5.3a
No ratings yet
RF Heating: Created in COMSOL Multiphysics 5.3a
22 pages
When I Was
No ratings yet
When I Was
6 pages
Financial Technologies (India) Limited CSR Policy
No ratings yet
Financial Technologies (India) Limited CSR Policy
8 pages
Statement of Purpose (Ashok)
No ratings yet
Statement of Purpose (Ashok)
2 pages
Search: Saudi Arabia Jobs Offered: The Online Community For Expatriates
No ratings yet
Search: Saudi Arabia Jobs Offered: The Online Community For Expatriates
6 pages
Garduate Nurse Perceptions of The Work Experience
No ratings yet
Garduate Nurse Perceptions of The Work Experience
7 pages
BLF24 T ST en GB
No ratings yet
BLF24 T ST en GB
4 pages
Physical Education Revision
No ratings yet
Physical Education Revision
3 pages
APSC 255 Formula Sheet
No ratings yet
APSC 255 Formula Sheet
3 pages
Assignment Questions 2
No ratings yet
Assignment Questions 2
2 pages
Mechatronics Project: Linear Displacement Indicator
No ratings yet
Mechatronics Project: Linear Displacement Indicator
6 pages
Ohn Oe
No ratings yet
Ohn Oe
2 pages