0% found this document useful (0 votes)

719 views13 pages

Pandas DataFrame Notes

This document provides a cheat sheet for working with pandas DataFrames. It introduces DataFrames and Series, the two main data types in pandas. It describes how DataFrames are two-dimensional tables containing Series of column data and indexes, and how Series are one-dimensional arrays of data with an index. The document outlines how to get data into and out of DataFrames from various sources like CSV files, dictionaries, and random generation. It also summarizes common operations for working with entire DataFrames or individual Series like arithmetic, indexing, filtering, and aggregation.

Uploaded by

pankaj sethia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

719 views13 pages

Pandas DataFrame Notes

Uploaded by

pankaj sethia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Cheat Sheet: The pandas DataFrame

Preliminaries Get your data into a DataFrame

Start by importing these Python modules Instantiate a DataFrame

import numpy as np df = pd.DataFrame() # the empty DataFrame
import pandas as pd df = pd.DataFrame(python_dictionary)
import matplotlib.pyplot as plt # for charts df = pd.DataFrame(numpy_matrix)

Check which version of pandas you are using Load a DataFrame from a CSV file
print(pd.__version__) df = pd.read_csv('file.csv', header=0,
This cheat sheet was written for pandas version 0.25. index_col=0, quotechar='"', sep=':',
It assumes you are using Python 3. na_values = ['na', '-', '.', ''])
Note: refer to pandas docs for all arguments

Get your data from inline python CSV text

The conceptual model from io import StringIO
data = """, Animal, Cuteness, Desirable
Pandas provides two important data types: the A, dog, 8.7, True
DataFrame and the Series. B, cat, 9.5, False"""
df = pd.read_csv(StringIO(data), header=0,
A DataFrame is a two-dimensional table of data with index_col=0, skipinitialspace=True)
column and row indexes (something like a spread Note: skipinitialspace=True allows a pretty layout
sheet). The columns are made up of Series objects.
Also, among many other options …
df = pd.read_html(url/html_string)
Column index (df.columns) df = pd.read_json(path/JSON_string)
df = pd.read_sql(query, connection)
df = pd.read_excel('filename.xlsx')
Series of data

Series of data

Series of data
Series of data

Series of data

df = pd. read_clipboard() # eg from Excel copy

(df.index)
Row index

Note: See the pandas documentation for arguments.

Fake up some random data – useful for testing

df = (pd.DataFrame(np.random.rand(1100, 6),
columns=list('ABCDEF')) - 0.5).cumsum()
df['Group'] = [np.random.choice(list('abcd'))
for _ in range(len(df))]
A DataFrame has two Indexes: df['Date'] = pd.date_range('1/1/2017',
• Typically, the column index (df.columns) is a list of periods=len(df), freq='D')
strings (variable names) or (less commonly) integers Hint: leave off the Group and/or Date cols if not needed
• Typically, the row index (df.index) might be:
o Integers – for case or row numbers;
o Strings – for case names; or
o DatetimeIndex or PeriodIndex – for time series Saving a DataFrame

A Series is an ordered, one-dimensional array of data Saving a DataFrame to a CSV file

with an index. All the data is of the same data type.
df.to_csv('filename.csv', encoding='utf-8')
Series arithmetic is vectorized after first aligning the
Series index for each of the operands.
Saving a DataFrame to an Excel Workbook
Examples of Series Arithmatic writer = pd.ExcelWriter('filename.xlsx')
s1 = pd.Series(range(0, 4)) # 0, 1, 2, 3 df.to_excel(writer, 'Sheet1')
s2 = pd.Series(range(1, 5)) # 1, 2, 3, 4 writer.save()
s3 = s1 + s2 # 1, 3, 5, 7
Saving a DataFrame to a Python object
s4 = pd.Series([1, 2, 3], index=[0, 1, 2]) d = df.to_dict() # to dictionary
s5 = pd.Series([1, 2, 3], index=[2, 1, 0]) m = df.values # to a numpy matrix
s6 = s4 + s5 # 4, 4, 4
Also, among many other options …
s7 = pd.Series([1, 2, 3], index=[1, 2, 3]) html = df.to_html()
s8 = pd.Series([1, 2, 3], index=[0, 1, 2]) df.to_json()
s9 = s7 + s8 # NAN, 3, 5, NAN df.to_sql()
df.to_clipboard() # then paste into Excel
Version 14 December 2019 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
1
Working with the whole DataFrame Working with Columns (and pandas Series)

Peek at the DataFrame contents/structure Peek at the column/Series structure/contents

df.info() # print cols & data types s = df[col].head(i) # get first i elements
dfh = df.head(i) # get first i rows s = df[col].tail(i) # get last i elements
dft = df.tail(i) # get last i rows s = df[col].describe() # summary stats
dfs = df.describe() # summary stats for cols
top_left_corner_df = df.iloc[:4, :4] Get column index and labels
idx = df.columns # get col index
DataFrame non-indexing attributes label = df.columns[0] # first col label
df = df.T # transpose rows and cols l = df.columns.tolist() # list of col labels
l = df.axes # list of row & col indexes a = df.columns.values # array of col labels
(ri,ci) = df.axes # from above
s = df.dtypes # Series column data types Change column labels
b = df.empty # True for empty DataFrame df = df.rename(columns={'old':'new','a':'1'})
i = df.ndim # number of axes (it is 2) df.columns = ['new1', 'new2', 'new3'] # etc.
t = df.shape # (row-count, column-count)
i = df.size # row-count * column-count Selecting columns
a = df.values # get a numpy matrix for df
s = df[col] # select col to Series
df = df[[col]] # select col to df
DataFrame utility methods df = df[[a, b]] # select 2-plus cols
df = df.copy() # copy a DataFrame df = df[[c, a, b]] # change col order
df = df.sort_values(by=col) s = df[df.columns[0]] # select by number
df = df.sort_values(by=[col1, col2]) df = df[df.columns[[0, 3, 4]]] # by numbers
df = df.sort_values(by=row, axis=1) df = df[df.columns[:-1]] # all but last col
df = df.sort_index() # axis=1 to sort cols s = df.pop(col) # get & drop from df
df = df.astype(dtype) # type conversion
Selecting columns with Python attributes
DataFrame iteration methods s = df.a # same as s = df['a']
df.iteritems() # (col-index, Series) pairs df.existing_column = df.a / df.b
df.iterrows() # (row-index, Series) pairs df['new_column'] = df.a / df.b
# example ... iterating over columns ... Trap: column names must be valid Python identifiers,
for (name, series) in df.iteritems(): but not a DataFrame method or attribute name.
print('\nCol name: ' + str(name)) Trap: cannot create new columns
print('1st value: ' + str(series.iat[0])) Hint: Don't be lazy: for clearer code avoid dot notation.

Maths on the whole DataFrame (not a complete list) Adding new columns to a DataFrame
df = df.abs() # absolute values df['new_col'] = range(len(df))
df = df.add(o) # add df, Series or value df['new_col'] = np.repeat(np.nan,len(df))
s = df.count() # non NA/null values df['random'] = np.random.rand(len(df))
df = df.cummax() # (cols default axis) df['index_as_col'] = df.index
df = df.cummin() # (cols default axis) df1[['b', 'c']] = df2[['e', 'f']]
df = df.cumsum() # (cols default axis) Trap: When adding a new column, only items from the
df = df.diff() # 1st diff (col def axis) new column series that have a corresponding index in
df = df.div(o) # div by df, Series, value the DataFrame will be added. The index of the receiving
df = df.dot(o) # matrix dot product DataFrame is not extended to accommodate all of the
s = df.max() # max of axis (col def) new series.
s = df.mean() # mean (col default axis) Trap: when adding a python list or numpy array, the
s = df.median() # median (col default) column will be added by integer position.
s = df.min() # min of axis (col def)
df = df.mul(o) # mul by df Series val Add a mismatched column with an extended index
s = df.sum() # sum axis (cols default) df = pd.DataFrame([1, 2, 3], index=[1, 2, 3])
df = df.where(df > 0.5, other=np.nan) s = pd.Series([2, 3, 4], index=[2, 3, 4])
Note: methods returning a series default to work on cols df = df.reindex(df.index.union(s.index))
df['s'] = s # with NaNs where no data
Select/filter rows/cols based on index label values Note: assumes unique index values
df = df.filter(items=['a', 'b']) # by col
df = df.filter(items=[5], axis=0) # by row Dropping (deleting) columns (mostly by label)
df = df.filter(like='x') # keep x in col df = df.drop(col1, axis=1)
df = df.filter(regex='x') # regex in col df = df.drop([col1, col2], axis=1)
df = df.select(lambda x: not x%5) # 5th rows del df[col] # even classic python works
Note: select takes a Boolean function, for cols: axis=1 df = df.drop(df.columns[0], axis=1) #first
Note: filter defaults to cols; select defaults to rows df = df.drop(df.columns[-1:], axis=1) #last
Version 14 December 2019 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
2
Swap column contents Multiply every column in DataFrame by a Series
df[['B', 'A']] = df[['A', 'B']] df = df.mul(s, axis=0) # on matched rows
Note: also add, sub, div, etc.
Vectorised arithmetic on columns
df['proportion'] = df['count'] / df['total'] Selecting columns with .loc, .iloc
df['percent'] = df['proportion'] * 100.0 df = df.loc[:, 'col1':'col2'] # inclusive
df = df.iloc[:, 0:2] # exclusive
Apply numpy mathematical functions to columns
df['log_data'] = np.log(df[col]) Get the integer position of a column index label
Note: many many more numpy math functions i = df.columns.get_loc('col_name')
Hint: Prefer pandas math over numpy where you can.
Test if column index values are unique/monotonic
Set column values set based on criteria if df.columns.is_unique: pass # ...
df[b] = df[a].where(df[a]>0, other=0) b = df.columns.is_monotonic_increasing
df[d] = df[a].where(df.b!=0, other=df.c) b = df.columns.is_monotonic_decreasing
Note: where other can be a Series or a scalar
Mapping a DataFrame column or Series
Data type conversions map = pd.Series(['red', 'green', 'blue'],
s = df[col].astype('float') index=['r', 'g', 'b'])
s = df[col].astype('int') s = pd.Series(['r', 'g', 'r', 'b']).map(map)
s = pd.to_numeric(df[col]) # s contains: ['red', 'green', 'red', 'blue']
s = df[col].astype('str')
a = df[col].values # numpy array m = pd.Series([True, False], index=['Y','N'])
l = df[col].tolist() # python list df =pd.DataFrame(np.random.choice(list('YN'),
Trap: index lost in conversion from Series to array or list 500, replace=True), columns=[col])
df[col] = df[col].map(m)
Common column-wide methods/attributes Note: Useful for decoding data before plotting
value = df[col].dtype # type of data Note: Sometimes referred to as a lookup function
value = df[col].size # col dimensions Note: Indexes can also be mapped if needed.
value = df[col].count() # non-NA count
value = df[col].sum() Find the largest and smallest values in a column
value = df[col].prod() s = df[col].nlargest(n)
value = df[col].min() s = df[col].nsmallest(n)
value = df[col].max()
value = df[col].mean() # also median() Sorting the columns of a DataFrame
value = df[col].cov(df[other_col]) df = df.sort_index(axis=1, ascending=False)
s = df[col].describe() Note: the column labels need to be comparable
s = df[col].value_counts()

Find first row index label for min/max val in column

label = df[col].idxmin()
label = df[col].idxmax()

Common column element-wise methods

s = df[col].isna()
s = df[col].notna() # not isna()
s = df[col].astype('float')
s = df[col].abs()
s = df[col].round(decimals=0)
s = df[col].diff(periods=1)
s = df[col].shift(periods=1)
s = df[col].to_datetime()
s = df[col].fillna(0) # replace NaN w 0
s = df[col].cumsum()
s = df[col].cumprod()
s = df[col].pct_change(periods=4)
s = df[col].rolling(window=4,
min_periods=4, center=False).sum()

Append a column of row sums to a DataFrame

df['Row Total'] = df.sum(axis=1)
Note: also means, mins, maxs, etc.

Version 14 December 2019 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
3
Select a slice of rows by integer position
Working with rows [inclusive-from : exclusive-to [: step]]
start is 0; end is len(df)
Get the row index and labels df = df.iloc[:] # copy entire DataFrame
idx = df.index # get row index df = df.iloc[0:2] # rows 0 and 1
label = df.index[0] # first row label df = df.iloc[2:3] # row 2 (the third row)
label = df.index[-1] # last row label df = df.iloc[-1:] # the last row
l = df.index.tolist() # get as a python list df = df.iloc[:-1] # all but the last row
a = df.index.values # get as numpy array df = df.iloc[::2] # every 2nd row (0 2 ..)
Hint: while the .iloc[] accessor may not be needed
Change the (row) index above, its use makes for more readable code.
df.index = idx # new ad hoc index
Select a slice of rows by label/index
df = df.set_index('A') # index set to col A
df = df.set_index(['A', 'B']) # MultiIndex df = df.loc['a':'c'] # rows 'a' through 'c'
df = df.reset_index() # replace old w new Note: [inclusive-from : inclusive–to [ : step]]
# note: old index stored as a col in df Hint: while the .ioc[] accessor may not be needed
df.index = range(len(df)) # set with list above, its use makes for more readable code.
df = df.reindex(index=range(len(df)))
df = df.set_index(keys=['r1', 'r2', 'etc']) Sorting the rows of a DataFrame by the row index
df = df.sort_index(ascending=False)
Adding rows
df = original_df.append(more_rows_in_df) Sorting DataFrame rows based on column values
Hint: convert row(s) to a DataFrame and then append. df = df.sort_values(by=df.columns[0],
Both DataFrames must have same column labels. ascending=False)
df = df.sort_values(by=[col1, col2])
Append a row of column totals to a DataFrame
df.loc['Total'] = df.sum() Random selection of rows
Note: best if all columns are numeric import random
k = 20 # pick a number
Iterating over DataFrame rows selection = random.sample(range(len(df)), k)
for (index, row) in df.iterrows(): # pass df_sample = df.iloc[selection, :] # get copy
Trap: row data may be coerced to the same data type Note: this randomly selected sample is not sorted

Dropping rows (by label) Drop duplicates in the row index

df = df.drop(row) df['index'] = df.index # 1 create new col
df = df.drop([row1, row2]) # multi-row drop df = df.drop_duplicates(cols='index',
take_last=True) # 2 use new col
del df['index'] # 3 del the col
Row selection by Boolean series
df = df.sort_index() # 4 tidy up
df = df.loc[df[col] >= 0.0]
df = df.loc[(df[col] >= 1.0) | (df[col]<0.0)]
Test if two DataFrames have same row index
df = df.loc[df[col].isin([1, 2, 5, 7, 11])]
df = df.loc[~df[col].isin([1, 2, 5, 7, 11])] len(a) == len(b) and all(a.index == b.index)
df = df.loc[df[col].str.contains('a')] Note: you may want to sort indexes first.
Hint: while the .loc[] accessor may not be needed
above, its use makes for more readable code. Get the integer position of a row or col index label
Trap: bitwise "or", "and" “not; (ie. | & ~) co-opted to be i = df.index.get_loc(row_label)
Boolean operators on a Series of Boolean; therefore, Trap: index.get_loc() returns an integer for a unique
you need parentheses around comparisons. match. If not a unique match, may return a slice/mask.

Selecting rows using isin over multiple columns Get integer position of rows that meet condition
# fake up some data a = np.where(df[col] >= 2) #numpy array
data = {1:[1,2,3], 2:[1,4,9], 3:[1,8,27]}
df = pd.DataFrame(data) Test if the row index values are unique/monotonic
if df.index.is_unique: pass # ...
# multi-column isin b = df.index.is_monotonic_increasing
lf = {1:[1, 3], 3:[8, 27]} # look for b = df.index.is_monotonic_decreasing
f = df.loc[df[list(lf)].isin(lf).all(axis=1)]
Find row index duplicates
Selecting rows using an index if df.index.has_duplicates:
idx = df[df[col] >= 2].index print(df.index.duplicated())
print(df.loc[idx]) Note: also similar for column label duplicates.

Version 14 December 2019 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
4
Working with cells Summary: selection using the DataFrame index

Getting a cell by row and column labels Select columns with []

value = df.at[row, col] s = df[col] # returns Series
value = df.loc[row, col] df = df[[col]] # returns DataFrame
Note: .at[] fastest label based scalar lookup df = df[[col1, col2]] # select cols with list
Note: at[] does not take slices as an argument df = df[idx] # select cols with an index
df = df[s] # select with col label Series
Setting a cell by row and column labels Note: scalar returns Series; list &c returns a DataFrame
df.at[row, col] = value Trap: With [] indexing, a label Series gets/sets columns,
df.loc[row, col] = value but a Boolean Series gets/sets rows
Avoid: chaining in the form df[col][row]
Avoid: chaining in the form df[col].at[row] Select rows with .loc[] or .iloc[]
df = df.loc[df[col]>0.5] # Boolean Series
Getting and slicing on labels df = df.loc['label'] # single label
df = df.loc['row1':'row3', 'col1':'col3'] df = df.loc[container] # lab list/Series
Note: the "to" on this slice is inclusive. df = df.loc['from':'to'] # inclusive slice
df = df.loc[bs] # Boolean Series
Setting a cross-section by labels df = df.iloc[0] # single integer
df.loc['A':'C', 'col1':'col3'] = np.nan df = df.iloc[container] # int list/Series
df.loc[1:2,'col1':'col2'] = np.zeros((2,2)) df = df.iloc[0:5] # exclusive slice
df.loc[1:2,'A':'C'] = other.loc[1:2,'A':'C'] Hint: Always use .loc[] or .iloc[] when selecting rows
Remember: inclusive "to" in the slice Hint: Never use the deprecated .ix[] indexer

Select individual cells with .at[,] or .iat[,]

Getting a cell by integer position
v = df.at[r, c] # fast scalar label accessor
value = df.iat[9, 3] # [row, col]
v = df.iat[r, c] # fast scalar int accessor
value = df.iat[len(df)-1, len(df.columns)-1]
Select a cross-section with .loc[,] or .iloc[,]
Getting a range of cells by int position
df = df.iloc[2:4, 2:4] # subset of the df # r and c can be scalar, list, slice
df = df.iloc[:5, :5] # top left corner xs = df.loc[r, c] # label accessor (row, col)
s = df.iloc[5, :] # return row as Series xs = df.iloc[r, c] # integer accessor
df = df.iloc[5:6, :] # returns row as row
DataFrame indexing methods
Note: exclusive "to" – same as python list slicing.
v = df.get_value(r, c) # get by row, col
Setting cell by integer position df = df.set_value(r,c,v) # set by row, col
df.iat[7, 8] = value df = df.xs(key, axis) # get cross-section
df = df.filter(items, like, regex, axis)
df = df.select(crit, axis)
Setting cell range by integer position
Note: the indexing attributes (.loc[], .iloc[], .at[] .iat[]) can
df.iloc[0:3, 0:5] = value
be used to get and set values in the DataFrame
df.iloc[1:3, 1:4] = np.ones((2, 3))
Note: the .loc[], and .iloc[] indexing attributes can accept
df.iloc[1:3, 1:4] = np.zeros((2, 3))
python slice objects. But .at[] and .iat[] do not
df.iloc[1:3, 1:4] = np.array([[1, 1, 1],
Note: .loc[] can also accept Boolean Series arguments
[2, 2, 2]]) Avoid: chaining in the form df[col_indexer][row_indexer]
Remember: exclusive-to in the slice Trap: label slices are inclusive, integer slices exclusive
Views and copies Some index attributes and methods
From the manual: Setting a copy can cause subtle b = idx.is_monotonic_decreasing
errors. The rules about when a view on the data is b = idx.is_monotonic_increasing
returned are dependent on NumPy. Whenever an array b = idx.has_duplicates
of labels or a Boolean vector are involved in the indexing i = idx.nlevels # num of index levels
operation, the result will be a copy. idx = idx.astype(dtype)# change data type
b = idx.equals(o) # check for equality
Hint: Pandas will usually warn you if you are trying to idx = idx.union(o) # union of two indexes
set a copy. Take these warnings seriously. i = idx.nunique() # number unique labels
label = idx.min() # minimum label
label = idx.max() # maximum label

Version 14 December 2019 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
5
Joining/Combining DataFrames Groupby: Split-Apply-Combine

Three ways to join DataFrames: Grouping

• concat – concatenate (or stack) two DataFrames gb = df.groupby(col) # by one columns
side by side, or one on top of the other gb = df.groupby([col1, col2]) # by 2 cols
• merge – using a database-like join operation on side- gb = df.groupby(level=0) # row index groupby
by-side DataFrames gb = df.groupby(level=['a','b']) #mult-idx gb
• combine_first – splice two DataFrames together, print(gb.groups)
choosing values from one over the other Note: groupby() returns a pandas groupby object
Note: the groupby object attribute .groups contains a
Simple concatenation is often what you want dictionary mapping of the groups.
df = pd.concat([df1,df2], axis=0) #top/bottom Trap: NaN values in the group key are automatically
df = pd.concat([df1,df2]).sort_index() # t/b dropped – there will never be a NA group.

df = pd.concat([df1,df2], axis=1) #left/right Applying an aggregating function

Trap: can end up with duplicate rows or cols # apply to a single column ...
Note: concat has an ignore_index parameter s = gb[col].sum()
Note: if no axis is specified, defaults to top/bottom. s = gb[col].agg(np.sum)

Append (another way of doing a top/bottom concat) # apply to every column in DataFrame ...
df = df1.append(df2) #top/bottom s = gb.count()
df = df1.append([df2, df3]) #top/bottom df_summary = gb.describe()
Note: append also has an ignore_index parameter df_row_1s = gb.first()
Note: aggregating functions include mean, sum, size,
Merge count, std, var, sem (standard error of the mean),
df_new = pd.merge(left=df1, right=df2, describe, first, last, min, max
how='outer', left_index=True,
right_index=True) # on indexes Applying multiple aggregating functions
df_new = pd.merge(left=df1, right=df2, # apply multiple functions to one column
how='left', left_on='col1', dfx = gb['col2'].agg([np.sum, np.mean])
right_on='col2') # on columns # apply to multiple fns to multiple cols
dfy = gb.agg({
df_new = df.merge(right=dfg, how='left', 'cat': np.count_nonzero,
left_on='Group', right_index=True) 'col1': [np.sum, np.mean, np.std],
How: 'left', 'right', 'outer', 'inner' (where outer=union/all; 'col2': [np.min, np.max]
inner=intersection) })
Note: merge is both a pandas helper-function, and a Note: gb['col2'] above is shorthand for
DataFrame method df.groupby('cat')['col2'], without the need for regrouping.
Note: DataFrame.merge() joins on common columns by
default (if left and right not specified) Applying transform functions
Trap: When joining on column values, the indexes on # transform to group z-scores, which have
the passed DataFrames are ignored. # a group mean of 0, and a std dev of 1.
Trap: many-to-many merges can result in an explosion zscore = lambda x: (x-x.mean())/x.std()
of associated data. dfz = gb.transform(zscore)

Join on row indexes (another way of merging) # replace missing data with the group mean
df = df1.join(other=df2, how='outer') mean_r = lambda x: x.fillna(x.mean())
df = df1.join(other=df2, on=['a','b'], df = gb.transform(mean_r) # entire DataFrame
how='outer') df[col] = gb[col].transform(mean_r) # one col
Note: DataFrame.join() joins on indexes by default. Note: can apply multiple transforming functions in a
manner similar to multiple aggregating functions above,
Combine_first
df = df1.combine_first(other=df2) Applying filtering functions
Filtering functions allow you to make selections based
# multi-combine with python reduce() on whether each group meets specified criteria
df = reduce(lambda x, y: # select groups with more than 10 members
x.combine_first(other=y), eleven = lambda x: (len(x['col1']) >= 11)
[df1, df2, df3, df4, df5]) df11 = gb.filter(eleven)
Combine_first uses the non-null values from df1. Null
values in df1 are filled with values from the same Group by a row index (non-hierarchical index)
location in df2. The index of the combined DataFrame df = df.set_index(keys='cat')
will be the union of the indexes from df1 and df2. s = df.groupby(level=0)[col].sum()
dfg = df.groupby(level=0).sum()

Version 14 December 2019 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
6
Pivot Tables: working with long and wide data Working with dates, times and their indexes

These features work with and often create Dates and time – points, spans, deltas and offsets
hierarchical or multi-level Indexes; Pandas has four date-time like objects that can used for
(the pandas MultiIndex is powerful and complex). data in a Series or in an Index:

Pivot, unstack, stack and melt Concept Data Index

Pivot tables move from long format to wide format data Point Timestamp DatetimeIndex
# Let's start with data in long format Span Period PeriodIndex
from io import StringIO Delta Timedelta TimedeltaIndex
data = """Date,Pollster,State,Party,Est Offset DateOffset None
13/03/2014, Newspoll, NSW, red, 25
13/03/2014, Newspoll, NSW, blue, 28 Timestamps
13/03/2014, Newspoll, Vic, red, 24 Timestamps represent a point in time.
13/03/2014, Newspoll, Vic, blue, 23 t = pd.Timestamp('2019-01-01')
13/03/2014, Galaxy, NSW, red, 23 t = pd.Timestamp('2019-01-01 21:15:06')
13/03/2014, Galaxy, NSW, blue, 24 t = pd.Timestamp('2019-01-01 21:15:06.7')
13/03/2014, Galaxy, Vic, red, 26 t = pd.Timestamp(year=2019, month=1,
13/03/2014, Galaxy, Vic, blue, 25 day=1, hour=21, minute=15, second=6.7,
13/03/2014, Galaxy, Qld, red, 21 tz='Australia/Sydney')
13/03/2014, Galaxy, Qld, blue, 27""" # handles daylight savings time
df = pd.read_csv(StringIO(data), Note: Timestamps can range from 1678 to 2261. (Check
header=0, skipinitialspace=True) out pd.Timestamp.max and pd.Timestamp.min).
Note: the dtype is datetime64[ns] or datetime64[ns,tz]
# pivot to wide format on 'Party' column
# 1st: set up a MultiIndex for other cols DatetimeIndex – an Index of Timestamps
df1 = df.set_index(['Date', 'Pollster', l = ['2019-04-01', '2019-04-02']
'State']) dti = pd.to_datetime(l)
# 2nd: do the pivot l2 = (['01-01-2019', '01-02-2019']
wide1 = df1.pivot(columns='Party') dti2 = pd.to_datetime(l2, dayfirst=True)
# unstack to wide format on State / Party
A Series of Timestamps
# 1st: MultiIndex all but the Values col
l = ['2019-04-01', '2019-04-02']
df2 = df.set_index(['Date', 'Pollster',
s = pd.to_datetime(pd.Series(l))
'State', 'Party'])
# 2nd: unstack a column to go wide on it Note: if we pass the pd.to_datetime() helper function a
wide2 = df2.unstack('State') pandas Series it returns a Series of Timestamps.
wide3 = df2.unstack() # pop last index
From non-standard strings to DatetimeIndex
# Use stack() to get back to long format t = ['09:08:55.7654-JAN092002',
long1 = wide1.stack() '15:42:02.6589-FEB082016']
# Then use reset_index() to remove the s = pd.to_datetime(t,
# MultiIndex. format="%H:%M:%S.%f-%b%d%Y"))
long2 = long1.reset_index() Also: %B = full month name; %m = numeric month;
%y = year without century; and more …
# Or melt() back to long format
# 1st: flatten the column index A range of Timestamps in a DatetimeIndex
wide1.columns = ['_'.join(col).strip() dti = pd.date_range('2015-01',
for col in wide1.columns.values] periods=len(df), freq='M') # end of month
# 2nd: remove the MultiIndex dti = pd.date_range('2019-01-01',
wdf = wide1.reset_index() periods=365, freq='D')
# 3rd: melt away
long3 = pd.melt(wdf, value_vars= Timestamps and DatetimeIndex from columns
['Est_blue', 'Est_red'], # fake up a DataFrame
var_name='Party', id_vars=['Date', y = [2019, 2019, 2019]
'Pollster', 'State']) m = [2, 3, 4]
Note: See documentation, there are many arguments to d = [1, 2, 2]
these methods. df = pd.DataFrame({'yr':y, 'mon':m, 'day':d})

# do the magic
cols = ['yr', 'mon', 'day']
df.index = pd.to_datetime(df[cols])
df['TS'] = pd.to_datetime(df[cols])

Version 14 December 2019 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
7
From DatetimeIndex to Python datetime objects Period frequency constants (not a complete list)
dti = pd.DatetimeIndex(pd.date_range( Name Description
start='1/1/2011', periods=4, freq='M')) U Microsecond
s = Series([1,2,3,4], index=dti) L Millisecond
a = dti.to_pydatetime() # numpy array S Second
a = s.index.to_pydatetime() # numpy array T Minute
H Hour
From Timestamps to Python dates or times D Calendar day
df['py_date'] = [x.date() for x in df['TS']] B Business day
df['py_time'] = [x.time() for x in df['TS']] W-{MON, TUE, …} Week ending on …
Note: converts to datatime.date or datetime.time. But MS Calendar start of month
does not convert to datetime.datetime. M Calendar end of month
QS-{JAN, FEB, …} Quarter start with year starting
Periods (QS – December)
Periods represent a time-span.
Q-{JAN, FEB, …} Quarter end with year ending (Q
p = pd.Period('2019', freq='Y') – December)
p = pd.Period('2019-01', freq='M') AS-{JAN, FEB, …} Year start (AS - December)
p = pd.Period('2019-01-01', freq='D')
A-{JAN, FEB, …} Year end (A - December)
p = pd.Period('2019-01-01 21:15:06',freq='S')
Deltas
From Timestamps to Periods in a Series When we subtract a Timestamp from another
l = ['2019-04-01', '2019-04-02'] Timestamp, we get a Timedelta object in pandas.
ts = pd.to_datetime(pd.Series(l)) ts = pd.Series(pd.date_range('2019-01-01',
ps = ts.dt.to_period(freq='D') periods=31, freq='D'))
Note: the .dt accessor in the last line delta_series = ts.diff(1)
From a DatetimeIndex to a PeriodIndex Converting a Timedelta to a numeric
l = ['2019-04-01', '2019-04-02'] l = ['2019-04-01', '2019-09-03']
dti = pd.to_datetime(l) s = pd.to_datetime(pd.Series(l))
pi = dti.to_period(freq='D') delta = s[1] - s[0]
Hint: unless you are working in less than seconds,
prefer PeriodIndex over DatetimeImdex. day = pd.Timedelta(days=1)
delta_num = delta / day
A range of Periods in a PeriodIndex minute = pd.Timedelta(minutes=1)
pi = pd.period_range('2015-01', delta_num2 = delta / minute
periods=len(df), freq='M')
pi = pd.period_range('2019-01-01', Offsets
periods=365, freq='D') Subtracting a Period from a Period gives an offset.
offset = pd.DateOffset(days=4)
Working with a PeriodIndex s = pd.Series(pd.period_range('2019-01-01',
pi = pd.period_range('1960-01','2015-12', periods=365, freq='D'))
freq='M') offset2 = s[4] - s[0]
a = pi.values # numpy array of integers s = s.diff(1) # s is now a series of offsets
p = pi.tolist() # python list of Periods
sp = pd.Series(pi) # pandas Series of Periods Converting an Offset to a numeric
s = pd.Series(pi).astype('str') x = offset.n # an individual offset
l = pd.Series(pi).astype('str').tolist() t = s.apply(lambda z: np.nan if z is np.nan
else z.n) # convert a Series
From DatetimeIndex to PeriodIndex and back
df = pd.DataFrame(np.random.randn(20,3)) Upsampling
df.index = pd.date_range('2015-01-01', # fake up some quarterly count data
periods=len(df), freq='M') pi = pd.period_range('1960Q1',
dfp = df.to_period(freq='M') periods=220, freq='Q')
dft = dfp.to_timestamp() df = pd.DataFrame(np.random.randint(low=0,
Note: from period to timestamp defaults to the point in high=999, size=(len(pi), 5)), index=pi)
time at the start of the period.
# which we can upsample to monthly count data
The tail of a time-series DataFrame dfm = df.resample('M').asfreq() # with NAs!
df = df.last("5M") # the last five months dfm2 = (df.resample('M').asfreq().fillna(0)
.rolling(window=3, min_periods=3).mean()
.bfill(limit=2)) # assuming no NA data
Note: df.resample(arguments).aggregating_function().
There are lots of options here. See the manual.

Version 14 December 2019 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
8
Downsampling
# downsample from monthly to quarterly counts Plotting from the DataFrame
dfq = dfm.resample('Q').sum()
Note: df.resample(arguments).aggregating_function(). Import matplotlib, choose a matplotlib style
import matplotlib.pyplot as plt
Time zones print(plt.style.available)
t = ['2015-06-30 00:00:00', plt.style.use('ggplot')
'2015-12-31 00:00:00']
dti = pd.to_datetime(t Fake up some data (which we reuse repeatedly)
).tz_localize('Australia/Canberra') a = np.random.normal(0, 1, 999)
dti = dti.tz_convert('UTC') b = np.random.normal(1, 2, 999)
ts = pd.Timestamp('now', c = np.random.normal(2, 3, 999)
tz='Europe/London') df = pd.DataFrame([a, b, c]).T
Note: by default, Timestamps are created without time df.columns =['A', 'B', 'C']
zone information.
Line plot
Row selection with a time-series index
df1 = df.cumsum()
# start with some play data ax = df1.plot()
n = 48
df = pd.DataFrame(np.random.randint(low=0, # from here down – standard plot output
high=999, size=(n, 5)), ax.set_title('Title')
index=pd.period_range('2015-01', ax.set_xlabel('X Axis')
periods=n, freq='M')) ax.set_ylabel('Y Axis')
february_selector = (df.index.month == 2) fig = ax.figure
february_data = df[february_selector] fig.set_size_inches(8, 3)
fig.tight_layout(pad=1)
q1_data = df[(df.index.month >= 1) & fig.savefig('filename.png', dpi=125)
(df.index.month <= 3)]
plt.close()
mayornov_data = df[(df.index.month == 5) |
(df.index.month == 11)]

year_totals = df.groupby(df.index.year).sum()
Also: year, month, day [of month], hour, minute, second,
dayofweek, weekofmonth, weekofyear [numbered from
1], week starts on Monday], dayofyear [from 1], …

The Series.dt accessor attribute

DataFrame columns that contain datetime-like objects
can be manipulated with the .dt accessor attribute
Box plot
t = ['2012-04-14 04:06:56.307000',
ax = df.plot.box(vert=False)
'2011-05-14 06:14:24.457000',
# followed by the standard plot code as above
'2010-06-14 08:23:07.520000']

# a Series of time stamps

s = pd.Series(pd.to_datetime(t))
print(s.dtype) # datetime64[ns]
print(s.dt.second) # 56, 24, 7
print(s.dt.month) # 4, 5, 6
# a Series of time periods
s = pd.Series(pd.PeriodIndex(t, freq='Q'))
print(s.dtype) # int64 ax = df.plot.box(column='c1', by='c2')
print(s.dt.quarter) # 2, 2, 2
print(s.dt.year) # 2012, 2011, 2010 Histogram
ax = df['A'].plot.hist(bins=20)
# followed by the standard plot code as above

Version 14 December 2019 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
9
Multiple histograms (overlapping or stacked) Scatter plot
ax = df.plot.hist(bins=25, alpha=0.5) # or... ax = df.plot.scatter(x='A', y='C')
ax = df.plot.hist(bins=25, stacked=True) # followed by the standard plot code as above
# followed by the standard plot code as above

Pie chart
s = pd.Series(data=[10, 20, 30],
index = ['dogs', 'cats', 'birds'])
ax = s.plot.pie(autopct='%.1f')

# followed by the standard plot output ...

ax.set_title('Pie Chart')
ax.set_aspect(1) # make it round
Bar plots ax.set_ylabel('') # remove default
bins = np.linspace(-10, 15, 26)
binned = pd.DataFrame() fig = ax.figure
for x in df.columns: fig.set_size_inches(8, 3)
y=pd.cut(df[x],bins,labels=bins[:-1]) fig.savefig('filename.png', dpi=125)
y=y.value_counts().sort_index()
binned = pd.concat([binned,y],axis=1) plt.close(fig)
binned.index = binned.index.astype('float')
binned.index += (np.diff(bins) / 2.0)
ax = binned.plot.bar(stacked=False,
width=0.8) # for bar width
# followed by the standard plot code as above

Change the range plotted

ax.set_xlim([-5, 5])

# for some white space on the chart ...

Horizontal bars lower, upper = ax.get_ylim()
ax = binned['A'][(binned.index >= -4) & ax.set_ylim([lower-1, upper+1])
(binned.index <= 4)].plot.barh()
# followed by the standard plot code as above Add a footnote to the chart
# after the fig.tight_layout(pad=1) above
fig.text(0.99, 0.01, 'Footnote',
ha='right', va='bottom',
fontsize='x-small',
fontstyle='italic', color='#999999')

Density plot
ax = df.plot.kde()
# followed by the standard plot code as above

Version 14 December 2019 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
10
A line and bar on the same chart
In matplotlib, bar charts visualise categorical or discrete Working with missing and non-finite data
data. Line charts visualise continuous data. This makes
it hard to get bars and lines on the same chart. Typically Working with missing data
combined charts either have too many labels, and/or the Pandas uses the not-a-number construct (np.nan and
lines and bars are misaligned or missing. You need to float('nan')) to indicate missing data. The Python None
trick matplotlib a bit … pandas makes this tricking easier can arise in data as well. It is also treated as missing
data; as is the pandas not-a-time construct
# start with fake percentage growth data (pandas.NaT).
s = pd.Series(np.random.normal(
1.02, 0.015, 40)) Missing data in a Series
s = s.cumprod() s = pd.Series( [8,None,float('nan'),np.nan])
dfg = (pd.concat([s / s.shift(1), #[8, NaN, NaN, NaN]
s / s.shift(4)], axis=1) * 100) - 100 s.isna() #[False, True, True, True]
dfg.columns = ['Quarter', 'Annual'] s.notna() #[True, False, False, False]
dfg.index = pd.period_range('2010-Q1', s.fillna(0)#[8, 0, 0, 0]
periods=len(dfg), freq='Q')
Missing data in a DataFrame
# reindex with integers from 0; keep old
df = df.dropna() # drop all rows with NaN
old = dfg.index
df = df.dropna(axis=1) # same for cols
dfg.index = range(len(dfg))
df = d f.dropna(how='all') # drop all NaN row
df = df.dropna(thresh=2) # drop 2+ NaN in r
# plot the line from pandas
# only drop row if NaN in a specified col
ax = dfg['Annual'].plot(color='blue',
df = df.dropna(df['col'].notnull())
label='Year/Year Growth')

# plot the bars from pandas Recoding missing data

dfg['Quarter'].plot.bar(ax=ax, df = df.fillna(0) # np.nan à 0
label='Q/Q Growth', width=0.8) s = df[col].fillna(0) # np.nan à 0
df = df.replace(r'\s+', np.nan,
# relabel the x-axis more appropriately regex=True) # white space à np.nan
ticks = dfg.index[((dfg.index+0)%4)==0]
labs = pd.Series(old[ticks]).astype('str') Non-finite numbers
ax.set_xticks(ticks) With floating point numbers, pandas provides for
ax.set_xticklabels(labs.str.replace('Q', positive and negative infinity.
'\nQ'), rotation=0) s = pd.Series([float('inf'), float('-inf'),
np.inf, -np.inf])
# fix the range of the x-axis … skip 1st Pandas treats integer comparisons with plus or minus
ax.set_xlim([0.5,len(dfg)-0.5]) infinity as expected.

# add the legend Testing for finite numbers

l=ax.legend(loc='best',fontsize='small') (using the data from the previous example)
b = np.isfinite(s)
# finish off and plot in the usual manner
ax.set_title('Fake Growth Data')
ax.set_xlabel('Quarter')
ax.set_ylabel('Per cent')

fig = ax.figure
fig.set_size_inches(8, 3)
fig.tight_layout(pad=1)
fig.savefig('filename.png', dpi=125)

plt.close()

Version 14 December 2019 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
11
Working with Categorical Data Working with strings

Categorical data Working with strings

The pandas Series has an R factors-like data type for # quickly let's fake-up some text data
encoding categorical data. df = pd.DataFrame("Lorem ipsum dolor sit
s = pd.Series(['a','b','a','c','b','d','a'], amet, consectetur adipiscing elit, sed do
dtype='category') eiusmod tempor incididunt ut labore et dolore
df['Cat'] = df['Group'].astype('category') magna aliqua".split(), columns=['t'])
Note: the key here is to specify the "category" data type.
Note: categories will be ordered on creation if they are # assume that df[col] is series of strings
sortable. This can be turned off. See ordering below. s1 = df['t'].str.lower()
s2 = df['t'].str.upper()
Convert back to the original data type s3 = df['t'].str.len()
s = pd.Series(['a','b','a','c','b','d','a'], df2 = df['t'].str.split('t', expand=True)
dtype='category')
s = s.astype('str') # pandas strings are just like Python strings
s4 = df['t'] + '-suffix' # concatenate
Ordering, reordering and sorting s5 = df['t'] * 5 # duplicate
s = pd.Series(list('abc'), dtype='category') Most python string functions are replicated in the pandas
print (s.cat.ordered) DataFrame and Series objects.
s = s.cat.reorder_categories(['b', 'c', 'a'])
s = s.sort_values() Text matching and regular expressions (regex)
s.cat.ordered = False s6 = df['t'].str.match('[sedo]+')
Trap: category must be ordered for it to be sorted s7 = df['t'].str.contains('[em]')
s8 = df['t'].str.startswith('do') # no regex
Renaming categories s8 = df['t'].str.endswith('.') # no regex
s = pd.Series(list('abc'), dtype='category') s9 = df['t'].str.replace('old', 'new')
s.cat.categories = [1, 2, 3] # in place s10 = df['t'].str.extract('(pattern)')
s = s.cat.rename_categories([4, 5, 6]) Note: pandas has many more methods.
# using a comprehension ...
s.cat.categories = ['Group ' + str(i)
for i in s.cat.categories]
Trap: categories must be uniquely named

Adding new categories

s = s.cat.add_categories([7, 8, 9])

Removing categories
s = s.cat.remove_categories([7, 9])
s.cat.remove_unused_categories() #inplace

Version 14 December 2019 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
12
Basic Statistics Cautionary note

Summary statistics This cheat sheet was cobbled together by tireless bots
s = df[col].describe() roaming the dark recesses of the Internet seeking ursine
df1 = df.describe() and anguine myths from a fabled land of milk and honey
where it is rumoured pandas and pythons gambol
DataFrame – key stats methods together. There is no guarantee the narratives were
df.corr() # pairwise correlation cols captured and transcribed accurately. You use these
notes at your own risk. You have been warned. I will not
df.cov() # pairwise covariance cols
df.kurt() # kurtosis over cols (def) be held responsible for whatever happens to you and
those you love once your eyes begin to see what is
df.mad() # mean absolute deviation
written here.
df.sem() # standard error of mean
df.var() # variance over cols (def)
Errors: If you find any errors, please email me at
[email protected]; (but please do not correct
Value counts my use of Australian-English spelling conventions).
s = df[col].value_counts()

Cross-tabulation (frequency count)

ct = pd.crosstab(index=df['a'],
cols=df['b'])

Quantiles and ranking

quants = [0.05, 0.25, 0.5, 0.75, 0.95]
q = df.quantile(quants)
r = df.rank()

Histogram binning
count, bins = np.histogram(df[col])
count, bins = np.histogram(df[col], bins=5)
count, bins = np.histogram(df[col],
bins=[-3,-2,-1,0,1,2,3,4])

Regression
import statsmodels.formula.api as sm
result = sm.ols(formula="col1 ~ col2 + col3",
data=df).fit()
print (result.params)
print (result.summary())

Simple smoothing example using a rolling apply

k3x5 = np.array([1,2,3,3,3,2,1]) / 15.0
s = df['A'].rolling(window=len(k3x5),
min_periods=len(k3x5),
center=True).apply(
func=lambda x: (x * k3x5).sum())
# fix the missing end data ... unsmoothed
s = df['A'].where(s.isna(), other=s)

Version 14 December 2019 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
13

Creating A Live World Weather Map Using Shiny - by M. Makkawi - The Startup - Medium
No ratings yet
Creating A Live World Weather Map Using Shiny - by M. Makkawi - The Startup - Medium
40 pages
Python Cheat Sheet For Excel Users
No ratings yet
Python Cheat Sheet For Excel Users
5 pages
NumPy Extended Cheatsheet Guide
No ratings yet
NumPy Extended Cheatsheet Guide
8 pages
Introduction To Numpy Exercise
No ratings yet
Introduction To Numpy Exercise
24 pages
Machine Learning Algorithm, Second Edition by Giuseppe Bonaccorso
No ratings yet
Machine Learning Algorithm, Second Edition by Giuseppe Bonaccorso
1 page
Top 5 Python Libraries for Data Science
100% (1)
Top 5 Python Libraries for Data Science
5 pages
Python DataScience Cheat-Sheet
100% (1)
Python DataScience Cheat-Sheet
7 pages
Core Python Summer Training Course
No ratings yet
Core Python Summer Training Course
3 pages
Pandas Data Analysis Handbook
No ratings yet
Pandas Data Analysis Handbook
55 pages
Yeungnam University School of Mechanical Engineering Syllabus For 0993 Tribology
No ratings yet
Yeungnam University School of Mechanical Engineering Syllabus For 0993 Tribology
42 pages
Python NumPy for Beginners
No ratings yet
Python NumPy for Beginners
50 pages
SciPy Data Science Guide
No ratings yet
SciPy Data Science Guide
39 pages
Python Bootcamps - Learn Python Programming and Code Training - Udemy
100% (1)
Python Bootcamps - Learn Python Programming and Code Training - Udemy
11 pages
Python ML Cheat Sheet
100% (1)
Python ML Cheat Sheet
29 pages
Matplotlib Plotting Guide
No ratings yet
Matplotlib Plotting Guide
11 pages
Seaborn Color Palette Usage Guide
No ratings yet
Seaborn Color Palette Usage Guide
67 pages
Data Visualization With Pandas
No ratings yet
Data Visualization With Pandas
8 pages
Python Tkinter GUI Guide
No ratings yet
Python Tkinter GUI Guide
81 pages
Openpyxl Documentation: Release 2.0.2
No ratings yet
Openpyxl Documentation: Release 2.0.2
47 pages
AI-Powered Exploratory Data Analysis (EDA) - 25 Prompts
No ratings yet
AI-Powered Exploratory Data Analysis (EDA) - 25 Prompts
9 pages
ML Algorithms
100% (1)
ML Algorithms
1 page
Python Beyond Automate The Boring Stuff With Python - Real-World Automation & Mastery
No ratings yet
Python Beyond Automate The Boring Stuff With Python - Real-World Automation & Mastery
20 pages
Manipulating and Analyzing Data With Pandas
No ratings yet
Manipulating and Analyzing Data With Pandas
50 pages
Anatomy of A Figure: Cheat Sheet
No ratings yet
Anatomy of A Figure: Cheat Sheet
3 pages
Python Core Material
No ratings yet
Python Core Material
162 pages
Tools Machine Learning
No ratings yet
Tools Machine Learning
9 pages
22 List Comprehensions
No ratings yet
22 List Comprehensions
14 pages
12 Comp Sci 1 Revision Notes Pythan Advanced Prog
No ratings yet
12 Comp Sci 1 Revision Notes Pythan Advanced Prog
5 pages
Matlab Tutorials
No ratings yet
Matlab Tutorials
172 pages
Data Analysis With Pandas - Introduction To Pandas Cheatsheet - Codecademy PDF
100% (1)
Data Analysis With Pandas - Introduction To Pandas Cheatsheet - Codecademy PDF
3 pages
Machine Learning Resource Guide
No ratings yet
Machine Learning Resource Guide
11 pages
Pandas
No ratings yet
Pandas
2,977 pages
Deep Learning With Python Sample
100% (1)
Deep Learning With Python Sample
31 pages
Data Visualization
No ratings yet
Data Visualization
24 pages
Python Machine Learning Projects Guide
100% (4)
Python Machine Learning Projects Guide
135 pages
Mobile-Based SIWES Placement Recommendation System (A Case Study of Nigerian Universities)
No ratings yet
Mobile-Based SIWES Placement Recommendation System (A Case Study of Nigerian Universities)
7 pages
DS203 2025 01 29 and 31 MLR
No ratings yet
DS203 2025 01 29 and 31 MLR
27 pages
Pandas Cheat Sheet for Data Science
No ratings yet
Pandas Cheat Sheet for Data Science
1 page
Beginner vs. Advanced Data Science with AI
No ratings yet
Beginner vs. Advanced Data Science with AI
3 pages
Numpy For Data Science ?
No ratings yet
Numpy For Data Science ?
9 pages
Matplotlib and Seaborn PDF
100% (1)
Matplotlib and Seaborn PDF
29 pages
Python Seaborn Notes
No ratings yet
Python Seaborn Notes
28 pages
Python Cheat-Sheet
No ratings yet
Python Cheat-Sheet
3 pages
Introduction To TensorFlow in Python
100% (3)
Introduction To TensorFlow in Python
146 pages
Data Visualization with Matplotlib Guide
No ratings yet
Data Visualization with Matplotlib Guide
15 pages
Python Data Types for Beginners
No ratings yet
Python Data Types for Beginners
58 pages
Python Pandas: Data Manipulation Guide
No ratings yet
Python Pandas: Data Manipulation Guide
84 pages
Data Visualization
No ratings yet
Data Visualization
35 pages
PythonGuide V1.2.9
100% (2)
PythonGuide V1.2.9
2 pages
Pandas DataFrame Cheat Sheet Guide
No ratings yet
Pandas DataFrame Cheat Sheet Guide
10 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Pandas DataFrame Cheat Sheet
100% (1)
Pandas DataFrame Cheat Sheet
10 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
4 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Pandas DataFrame Cheat Sheet Guide
No ratings yet
Pandas DataFrame Cheat Sheet Guide
12 pages
Pandas DataFrame Notes
67% (3)
Pandas DataFrame Notes
13 pages
Python Pandas Cheat Sheet Guide
No ratings yet
Python Pandas Cheat Sheet Guide
11 pages
Essential Pandas DataFrame Guide
No ratings yet
Essential Pandas DataFrame Guide
9 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
12 pages
Design Patterns: Presented By: Pankaj Sethia
100% (1)
Design Patterns: Presented By: Pankaj Sethia
21 pages
Matrix Structures in Image Processing
No ratings yet
Matrix Structures in Image Processing
8 pages
Hands On Computer Vision With Tensorflow 2 Lever 9781788830645 yzmGxHcemCk
No ratings yet
Hands On Computer Vision With Tensorflow 2 Lever 9781788830645 yzmGxHcemCk
2 pages
Hands On Computer Vision With Tensorflow 2 Lever 1788830644 mE2AdpJs
No ratings yet
Hands On Computer Vision With Tensorflow 2 Lever 1788830644 mE2AdpJs
1 page
RPH Lesson Study
No ratings yet
RPH Lesson Study
4 pages
Aops Community 2017 Imo: Proposed by Stephan Wagner, South Africa
No ratings yet
Aops Community 2017 Imo: Proposed by Stephan Wagner, South Africa
2 pages
Commission On Higher Education: in Collaboration With The Philippine Normal University
No ratings yet
Commission On Higher Education: in Collaboration With The Philippine Normal University
23 pages
N R References: 104.32 The Riemann Zeta Function As A Sum of Geometric Series
No ratings yet
N R References: 104.32 The Riemann Zeta Function As A Sum of Geometric Series
4 pages
Poincaré Group PDF
No ratings yet
Poincaré Group PDF
5 pages
Module 02 - Robot Kinematics
No ratings yet
Module 02 - Robot Kinematics
46 pages
Gauss' Law Learning Activity
No ratings yet
Gauss' Law Learning Activity
2 pages
Edexcel C1 Hardest Questions
No ratings yet
Edexcel C1 Hardest Questions
83 pages
Module1 Foundation of Logic and Proofs
No ratings yet
Module1 Foundation of Logic and Proofs
18 pages
Table of Specification (Math 10)
No ratings yet
Table of Specification (Math 10)
3 pages
Complete The Transformation: Deped Competency
No ratings yet
Complete The Transformation: Deped Competency
5 pages
Mind Over Math 1
No ratings yet
Mind Over Math 1
8 pages
Real-Life Applications of Integer Subtraction
No ratings yet
Real-Life Applications of Integer Subtraction
13 pages
Eureka Math Homework Helpers for K
100% (1)
Eureka Math Homework Helpers for K
6 pages
Function and Relation
No ratings yet
Function and Relation
15 pages
Math8Q4SummativeTest 2021 2022
No ratings yet
Math8Q4SummativeTest 2021 2022
8 pages
Mathematics P2 Memo May 2023 IEB
No ratings yet
Mathematics P2 Memo May 2023 IEB
18 pages
sequences-NonCalc Questions
No ratings yet
sequences-NonCalc Questions
20 pages
ETABS Local Axes Guide
No ratings yet
ETABS Local Axes Guide
2 pages
Cumulative Reasoning
No ratings yet
Cumulative Reasoning
36 pages
Non Convergence in Caesar II
No ratings yet
Non Convergence in Caesar II
11 pages
10th Cbse Final All Chapters
No ratings yet
10th Cbse Final All Chapters
36 pages
Alg1 m1 Teacher Edition v1 3 1
No ratings yet
Alg1 m1 Teacher Edition v1 3 1
347 pages
Geometry Class: Fractals & Triangles
No ratings yet
Geometry Class: Fractals & Triangles
37 pages
Contact Analysis - Theory and Concepts: Theodore Sussman, PH.D
No ratings yet
Contact Analysis - Theory and Concepts: Theodore Sussman, PH.D
47 pages
F4a 1
No ratings yet
F4a 1
122 pages
Husserl's Philosophy of Mathematics PDF
100% (1)
Husserl's Philosophy of Mathematics PDF
30 pages
DLD Lab Manual#3 Document
No ratings yet
DLD Lab Manual#3 Document
10 pages
12th Maths Unit 2 Full Test 3 Answer Key English Medium PDF Download
No ratings yet
12th Maths Unit 2 Full Test 3 Answer Key English Medium PDF Download
4 pages
ECE3829 Class Examples: Binary To BCD Converter
No ratings yet
ECE3829 Class Examples: Binary To BCD Converter
6 pages