.. currentmodule:: pandas
.. ipython:: python :suppress: import numpy as np import random import os np.random.seed(123456) from pandas import * options.display.max_rows=15 import pandas as pd randn = np.random.randn randint = np.random.randint np.set_printoptions(precision=4, suppress=True)
This is a respository for short and sweet examples and links for useful pandas recipes. We encourage users to add to this documentation.
This is a great First Pull Request (to add interesting links and/or put short code inline for existing links)
These are some neat pandas idioms
How to split a frame with a boolean criterion?
How to select from a frame with complex criteria?
Select rows closest to a user defined numer
The :ref:`indexing <indexing>` docs.
Indexing using both row labels and conditionals, see here
Use loc for label-oriented slicing and iloc positional slicing, see here
Extend a panel frame by transposing, adding a new dimension, and transposing back to the original dimensions, see here
Mask a panel by using np.where
and then reconstructing the panel with the new masked values
here
Using ~
to take the complement of a boolean array, see
here
Efficiently creating columns using applymap
The :ref:`multindexing <indexing.hierarchical>` docs.
Creating a multi-index from a labeled frame
Performing arithmetic with a multi-index that needs broadcastin
Slicing a multi-index with xs #2
Setting portions of a multi-index with xs
Partial Selection, the need for sortedness
Prepending a level to a multiindex
The :ref:`panelnd<dsintro.panelnd>` docs.
The :ref:`missing data<missing_data>` docs.
Fill forward a reversed timeseries
.. ipython:: python df = pd.DataFrame(np.random.randn(6,1), index=pd.date_range('2013-08-01', periods=6, freq='B'), columns=list('A')) df.ix[3,'A'] = np.nan df df.reindex(df.index[::-1]).ffill()
The :ref:`grouping <groupby>` docs.
Apply to different items in a group
Replacing values with groupby means
Sort by group with aggregation
Create multiple aggregated columns
Create a value counts column and reassign back to the DataFrame
Rolling Computation window based on values instead of counts
The :ref:`Pivot <reshaping.pivot>` docs.
Frequency table like plyr in R
Turning embeded lists into a multi-index frame
Turn a matrix with hours in columns and days in rows into a continous row sequence in the form of a time series. How to rearrange a python pandas dataframe?
The :ref:`Resample <timeseries.resampling>` docs.
TimeGrouping of values grouped across time
Using TimeGrouper and another grouping to create subgroups, then apply a custom function
Resampling with custom periods
Resample intraday frame without adding new days
The :ref:`Concat <merging.concatenation>` docs. The :ref:`Join <merging.join>` docs.
Join with a criteria based on the values
The :ref:`Plotting <visualization>` docs.
Setting x-axis major and minor labels
Plotting multiple charts in an ipython notebook
Annotate a time-series plot #2
Performance comparison of SQL vs HDF5
The :ref:`CSV <io.read_csv_table>` docs
Reading only certain rows of a csv chunk-by-chunk
Reading the first few lines of a frame
Reading a file that is compressed but not by gzip/bz2
(the native compresed formats which read_csv
understands).
This example shows a WinZipped
file, but is a general application of opening the file within a context manager and
using that handle to read.
See here
Reading CSV with Unix timestamps and converting to local timezone
Write a multi-row index CSV without writing duplicates
The :ref:`SQL <io.sql>` docs
Reading from databases with SQL
The :ref:`Excel <io.excel>` docs
Reading from a filelike handle
Reading HTML tables from a server that cannot handle the default request header
The :ref:`HDFStores <io.hdf5>` docs
Simple Queries with a Timestamp Index
Managing heteregenous data using a linked multiple table hierarchy
Merging on-disk tables with millions of rows
Deduplicating a large store by chunks, essentially a recusive reduction operation. Shows a function for taking in data from csv file and creating a store by chunks, with date parsing as well. See here
Appending to a store, while creating a unique index
Reading in a sequence of files, then providing a global unique index to a store while appending
Troubleshoot HDFStore exceptions
Setting min_itemsize with strings
Using ptrepack to create a completely-sorted-index on a store
Storing Attributes to a group node
.. ipython:: python df = DataFrame(np.random.randn(8,3)) store = HDFStore('test.h5') store.put('df',df) # you can store an arbitrary python object via pickle store.get_storer('df').attrs.my_attribute = dict(A = 10) store.get_storer('df').attrs.my_attribute
.. ipython:: python :suppress: store.close() os.remove('test.h5')
Numerical integration (sample-based) of a time series
The :ref:`Timedeltas <timeseries.timedeltas>` docs.
Create timedeltas with date differences
Adding days to dates in a dataframe
To globally provide aliases for axis names, one can define these 2 functions:
.. ipython:: python def set_axis_alias(cls, axis, alias): if axis not in cls._AXIS_NUMBERS: raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias)) cls._AXIS_ALIASES[alias] = axis def clear_axis_alias(cls, axis, alias): if axis not in cls._AXIS_NUMBERS: raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias)) cls._AXIS_ALIASES.pop(alias,None) set_axis_alias(DataFrame,'columns', 'myaxis2') df2 = DataFrame(randn(3,2),columns=['c1','c2'],index=['i1','i2','i3']) df2.sum(axis='myaxis2') clear_axis_alias(DataFrame,'columns', 'myaxis2')