.. _whatsnew_0700: v.0.7.0 (February 9, 2012) -------------------------- New features ~~~~~~~~~~~~ - New unified :ref:`merge function ` for efficiently performing full gamut of database / relational-algebra operations. Refactored existing join methods to use the new infrastructure, resulting in substantial performance gains (GH220_, GH249_, GH267_) - New :ref:`unified concatenation function ` for concatenating Series, DataFrame or Panel objects along an axis. Can form union or intersection of the other axes. Improves performance of ``Series.append`` and ``DataFrame.append`` (GH468_, GH479_, GH273_) - :ref:`Can ` pass multiple DataFrames to `DataFrame.append` to concatenate (stack) and multiple Series to ``Series.append`` too - :ref:`Can` pass list of dicts (e.g., a list of JSON objects) to DataFrame constructor (GH526_) - You can now :ref:`set multiple columns ` in a DataFrame via ``__getitem__``, useful for transformation (GH342_) - Handle differently-indexed output values in ``DataFrame.apply`` (GH498_) .. ipython:: python df = DataFrame(randn(10, 4)) df.apply(lambda x: x.describe()) - :ref:`Add` ``reorder_levels`` method to Series and DataFrame (PR534_) - :ref:`Add` dict-like ``get`` function to DataFrame and Panel (PR521_) - :ref:`Add` ``DataFrame.iterrows`` method for efficiently iterating through the rows of a DataFrame - :ref:`Add` ``DataFrame.to_panel`` with code adapted from ``LongPanel.to_long`` - :ref:`Add ` ``reindex_axis`` method added to DataFrame - :ref:`Add ` ``level`` option to binary arithmetic functions on ``DataFrame`` and ``Series`` - :ref:`Add ` ``level`` option to the ``reindex`` and ``align`` methods on Series and DataFrame for broadcasting values across a level (GH542_, PR552_, others) - :ref:`Add ` attribute-based item access to ``Panel`` and add IPython completion (PR563_) - :ref:`Add ` ``logy`` option to ``Series.plot`` for log-scaling on the Y axis - :ref:`Add ` ``index`` and ``header`` options to ``DataFrame.to_string`` - :ref:`Can ` pass multiple DataFrames to ``DataFrame.join`` to join on index (GH115_) - :ref:`Can ` pass multiple Panels to ``Panel.join`` (GH115_) - :ref:`Added ` ``justify`` argument to ``DataFrame.to_string`` to allow different alignment of column headers - :ref:`Add ` ``sort`` option to GroupBy to allow disabling sorting of the group keys for potential speedups (GH595_) - :ref:`Can ` pass MaskedArray to Series constructor (PR563_) - :ref:`Add ` Panel item access via attributes and IPython completion (GH554_) - Implement ``DataFrame.lookup``, fancy-indexing analogue for retrieving values given a sequence of row and column labels (GH338_) - Can pass a :ref:`list of functions ` to aggregate with groupby on a DataFrame, yielding an aggregated result with hierarchical columns (GH166_) - Can call ``cummin`` and ``cummax`` on Series and DataFrame to get cumulative minimum and maximum, respectively (GH647_) - ``value_range`` added as utility function to get min and max of a dataframe (GH288_) - Added ``encoding`` argument to ``read_csv``, ``read_table``, ``to_csv`` and ``from_csv`` for non-ascii text (GH717_) - :ref:`Added ` ``abs`` method to pandas objects - :ref:`Added ` ``crosstab`` function for easily computing frequency tables - :ref:`Added ` ``isin`` method to index objects - :ref:`Added ` ``level`` argument to ``xs`` method of DataFrame. API Changes to integer indexing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ One of the potentially riskiest API changes in 0.7.0, but also one of the most important, was a complete review of how **integer indexes** are handled with regard to label-based indexing. Here is an example: .. ipython:: python s = Series(randn(10), index=range(0, 20, 2)) s s[0] s[2] s[4] This is all exactly identical to the behavior before. However, if you ask for a key **not** contained in the Series, in versions 0.6.1 and prior, Series would *fall back* on a location-based lookup. This now raises a ``KeyError``: .. code-block:: ipython In [2]: s[1] KeyError: 1 This change also has the same impact on DataFrame: .. code-block:: ipython In [3]: df = DataFrame(randn(8, 4), index=range(0, 16, 2)) In [4]: df 0 1 2 3 0 0.88427 0.3363 -0.1787 0.03162 2 0.14451 -0.1415 0.2504 0.58374 4 -1.44779 -0.9186 -1.4996 0.27163 6 -0.26598 -2.4184 -0.2658 0.11503 8 -0.58776 0.3144 -0.8566 0.61941 10 0.10940 -0.7175 -1.0108 0.47990 12 -1.16919 -0.3087 -0.6049 -0.43544 14 -0.07337 0.3410 0.0424 -0.16037 In [5]: df.ix[3] KeyError: 3 In order to support purely integer-based indexing, the following methods have been added: .. csv-table:: :header: "Method","Description" :widths: 40,60 ``Series.iget_value(i)``, Retrieve value stored at location ``i`` ``Series.iget(i)``, Alias for ``iget_value`` ``DataFrame.irow(i)``, Retrieve the ``i``-th row ``DataFrame.icol(j)``, Retrieve the ``j``-th column "``DataFrame.iget_value(i, j)``", Retrieve the value at row ``i`` and column ``j`` API tweaks regarding label-based slicing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Label-based slicing using ``ix`` now requires that the index be sorted (monotonic) **unless** both the start and endpoint are contained in the index: .. ipython:: python s = Series(randn(6), index=list('gmkaec')) s Then this is OK: .. ipython:: python s.ix['k':'e'] But this is not: .. code-block:: ipython In [12]: s.ix['b':'h'] KeyError 'b' If the index had been sorted, the "range selection" would have been possible: .. ipython:: python s2 = s.sort_index() s2 s2.ix['b':'h'] Changes to Series ``[]`` operator ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As as notational convenience, you can pass a sequence of labels or a label slice to a Series when getting and setting values via ``[]`` (i.e. the ``__getitem__`` and ``__setitem__`` methods). The behavior will be the same as passing similar input to ``ix`` **except in the case of integer indexing**: .. ipython:: python s = Series(randn(6), index=list('acegkm')) s s[['m', 'a', 'c', 'e']] s['b':'l'] s['c':'k'] In the case of integer indexes, the behavior will be exactly as before (shadowing ``ndarray``): .. ipython:: python s = Series(randn(6), index=range(0, 12, 2)) s[[4, 0, 2]] s[1:5] If you wish to do indexing with sequences and slicing on an integer index with label semantics, use ``ix``. Other API Changes ~~~~~~~~~~~~~~~~~ - The deprecated ``LongPanel`` class has been completely removed - If ``Series.sort`` is called on a column of a DataFrame, an exception will now be raised. Before it was possible to accidentally mutate a DataFrame's column by doing ``df[col].sort()`` instead of the side-effect free method ``df[col].order()`` (GH316_) - Miscellaneous renames and deprecations which will (harmlessly) raise ``FutureWarning`` - ``drop`` added as an optional parameter to ``DataFrame.reset_index`` (GH699_) Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - :ref:`Cythonized GroupBy aggregations ` no longer presort the data, thus achieving a significant speedup (GH93_). GroupBy aggregations with Python functions significantly sped up by clever manipulation of the ndarray data type in Cython (GH496_). - Better error message in DataFrame constructor when passed column labels don't match data (GH497_) - Substantially improve performance of multi-GroupBy aggregation when a Python function is passed, reuse ndarray object in Cython (GH496_) - Can store objects indexed by tuples and floats in HDFStore (GH492_) - Don't print length by default in Series.to_string, add `length` option (GH489_) - Improve Cython code for multi-groupby to aggregate without having to sort the data (GH93_) - Improve MultiIndex reindexing speed by storing tuples in the MultiIndex, test for backwards unpickling compatibility - Improve column reindexing performance by using specialized Cython take function - Further performance tweaking of Series.__getitem__ for standard use cases - Avoid Index dict creation in some cases (i.e. when getting slices, etc.), regression from prior versions - Friendlier error message in setup.py if NumPy not installed - Use common set of NA-handling operations (sum, mean, etc.) in Panel class also (GH536_) - Default name assignment when calling ``reset_index`` on DataFrame with a regular (non-hierarchical) index (GH476_) - Use Cythonized groupers when possible in Series/DataFrame stat ops with ``level`` parameter passed (GH545_) - Ported skiplist data structure to C to speed up ``rolling_median`` by about 5-10x in most typical use cases (GH374_) .. _GH115: https://github.com/pydata/pandas/issues/115 .. _GH166: https://github.com/pydata/pandas/issues/166 .. _GH220: https://github.com/pydata/pandas/issues/220 .. _GH288: https://github.com/pydata/pandas/issues/288 .. _GH249: https://github.com/pydata/pandas/issues/249 .. _GH267: https://github.com/pydata/pandas/issues/267 .. _GH273: https://github.com/pydata/pandas/issues/273 .. _GH316: https://github.com/pydata/pandas/issues/316 .. _GH338: https://github.com/pydata/pandas/issues/338 .. _GH342: https://github.com/pydata/pandas/issues/342 .. _GH374: https://github.com/pydata/pandas/issues/374 .. _GH439: https://github.com/pydata/pandas/issues/439 .. _GH468: https://github.com/pydata/pandas/issues/468 .. _GH476: https://github.com/pydata/pandas/issues/476 .. _GH479: https://github.com/pydata/pandas/issues/479 .. _GH489: https://github.com/pydata/pandas/issues/489 .. _GH492: https://github.com/pydata/pandas/issues/492 .. _GH496: https://github.com/pydata/pandas/issues/496 .. _GH497: https://github.com/pydata/pandas/issues/497 .. _GH498: https://github.com/pydata/pandas/issues/498 .. _GH526: https://github.com/pydata/pandas/issues/526 .. _GH536: https://github.com/pydata/pandas/issues/536 .. _GH542: https://github.com/pydata/pandas/issues/542 .. _GH545: https://github.com/pydata/pandas/issues/545 .. _GH554: https://github.com/pydata/pandas/issues/554 .. _GH595: https://github.com/pydata/pandas/issues/595 .. _GH647: https://github.com/pydata/pandas/issues/647 .. _GH699: https://github.com/pydata/pandas/issues/699 .. _GH717: https://github.com/pydata/pandas/issues/717 .. _GH93: https://github.com/pydata/pandas/issues/93 .. _GH93: https://github.com/pydata/pandas/issues/93 .. _PR521: https://github.com/pydata/pandas/pull/521 .. _PR534: https://github.com/pydata/pandas/pull/534 .. _PR552: https://github.com/pydata/pandas/pull/552 .. _PR554: https://github.com/pydata/pandas/pull/554 .. _PR563: https://github.com/pydata/pandas/pull/563