.. _cookbook: .. currentmodule:: pandas .. ipython:: python :suppress: import numpy as np import random import os np.random.seed(123456) from pandas import * import pandas as pd randn = np.random.randn randint = np.random.randint np.set_printoptions(precision=4, suppress=True) ******** Cookbook ******** This is a respository for *short and sweet* examples and links for useful pandas recipes. We encourage users to add to this documentation. This is a great *First Pull Request* (to add interesting links and/or put short code inline for existing links) .. _cookbook.selection: Selection --------- The :ref:`indexing ` docs. `Boolean Rows Indexing `__ `Using loc and iloc in selections `__ `Extending a panel along the minor axis `__ `Boolean masking in a panel `__ `Selecting via the complement `__ .. _cookbook.multi_index: MultiIndexing ------------- The :ref:`multindexing ` docs. `Creating a multi-index from a labeled frame `__ Slicing ~~~~~~~ `Slicing a multi-index with xs `__ `Slicing a multi-index with xs #2 `__ Sorting ~~~~~~~ `Multi-index sorting `__ `Partial Selection, the need for sortedness `__ Levels ~~~~~~ `Prepending a level to a multiindex `__ `Flatten Hierarchical columns `__ .. _cookbook.grouping: Grouping -------- The :ref:`grouping ` docs. `Basic grouping with apply `__ `Using get_group `__ `Apply to different items in a group `__ `Expanding Apply `__ `Replacing values with groupby means `__ `Sort by group with aggregation `__ `Create multiple aggregated columns `__ Expanding Data ~~~~~~~~~~~~~~ `Alignment and to-date `__ `Rolling Computation window based on values instead of counts `__ Splitting ~~~~~~~~~ `Splitting a frame `__ .. _cookbook.pivot: Pivot ~~~~~ The :ref:`Pivot ` docs. `Partial sums and subtotals `__ `Frequency table like plyr in R `__ Timeseries ---------- `Between times `__ `Vectorized Lookup `__ .. _cookbook.resample: Resampling ~~~~~~~~~~ The :ref:`Resample ` docs. `TimeGrouping of values grouped across time `__ `TimeGrouping #2 `__ `Resampling with custom periods `__ `Resample intraday frame without adding new days `__ `Resample minute data `__ .. _cookbook.merge: Merge ----- The :ref:`Concat ` docs. The :ref:`Join ` docs. `emulate R rbind `__ `Self Join `__ `How to set the index and join `__ `KDB like asof join `__ `Join with a criteria based on the values `__ .. _cookbook.plotting: Plotting -------- The :ref:`Plotting ` docs. `Make Matplotlib look like R `__ `Setting x-axis major and minor labels `__ Data In/Out ----------- .. _cookbook.csv: CSV ~~~ The :ref:`CSV ` docs `read_csv in action `__ `Reading a csv chunk-by-chunk `__ `Reading the first few lines of a frame `__ `Inferring dtypes from a file `__ `Dealing with bad lines `__ .. _cookbook.sql: SQL ~~~ The :ref:`SQL ` docs `Reading from databases with SQL `__ .. _cookbook.excel: Excel ~~~~~ The :ref:`Excel ` docs `Reading from a filelike handle `__ .. _cookbook.hdf: HDFStore ~~~~~~~~ The :ref:`HDFStores ` docs `Simple Queries with a Timestamp Index `__ `Managing heteregenous data using a linked multiple table hierarchy `__ `Merging on-disk tables with millions of rows `__ `Large Data work flows `__ `Troubleshoot HDFStore exceptions `__ Storing Attributes to a group node .. ipython:: python df = DataFrame(np.random.randn(8,3)) store = HDFStore('test.h5') store.put('df',df) # you can store an arbitrary python object via pickle store.get_storer('df').attrs.my_attribute = dict(A = 10) store.get_storer('df').attrs.my_attribute .. ipython:: python :suppress: store.close() os.remove('test.h5') Miscellaneous ------------- The :ref:`Timedeltas ` docs. `Operating with timedeltas `__ `Create timedeltas with date differences `__ Aliasing Axis Names ------------------- To globally provide aliases for axis names, one can define these 2 functions: .. ipython:: python def set_axis_alias(cls, axis, alias): if axis not in cls._AXIS_NUMBERS: raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias)) cls._AXIS_ALIASES[alias] = axis def clear_axis_alias(cls, axis, alias): if axis not in cls._AXIS_NUMBERS: raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias)) cls._AXIS_ALIASES.pop(alias,None) set_axis_alias(DataFrame,'columns', 'myaxis2') df2 = DataFrame(randn(3,2),columns=['c1','c2'],index=['i1','i2','i3']) df2.sum(axis='myaxis2') clear_axis_alias(DataFrame,'columns', 'myaxis2')