.. _whatsnew_0901: .. ipython:: python :suppress: from pandas.compat import StringIO v0.9.1 (November 14, 2012) -------------------------- This is a bugfix release from 0.9.0 and includes several new features and enhancements along with a large number of bug fixes. The new features include by-column sort order for DataFrame and Series, improved NA handling for the rank method, masking functions for DataFrame, and intraday time-series filtering for DataFrame. New features ~~~~~~~~~~~~ - `Series.sort`, `DataFrame.sort`, and `DataFrame.sort_index` can now be specified in a per-column manner to support multiple sort orders (:issue:`928`) .. ipython:: python df = DataFrame(np.random.randint(0, 2, (6, 3)), columns=['A', 'B', 'C']) df.sort(['A', 'B'], ascending=[1, 0]) - `DataFrame.rank` now supports additional argument values for the `na_option` parameter so missing values can be assigned either the largest or the smallest rank (:issue:`1508`, :issue:`2159`) .. ipython:: python df = DataFrame(np.random.randn(6, 3), columns=['A', 'B', 'C']) df.ix[2:4] = np.nan df.rank() df.rank(na_option='top') df.rank(na_option='bottom') - DataFrame has new `where` and `mask` methods to select values according to a given boolean mask (:issue:`2109`, :issue:`2151`) DataFrame currently supports slicing via a boolean vector the same length as the DataFrame (inside the `[]`). The returned DataFrame has the same number of columns as the original, but is sliced on its index. .. ipython:: python df = DataFrame(np.random.randn(5, 3), columns = ['A','B','C']) df df[df['A'] > 0] If a DataFrame is sliced with a DataFrame based boolean condition (with the same size as the original DataFrame), then a DataFrame the same size (index and columns) as the original is returned, with elements that do not meet the boolean condition as `NaN`. This is accomplished via the new method `DataFrame.where`. In addition, `where` takes an optional `other` argument for replacement. .. ipython:: python df[df>0] df.where(df>0) df.where(df>0,-df) Furthermore, `where` now aligns the input boolean condition (ndarray or DataFrame), such that partial selection with setting is possible. This is analagous to partial setting via `.ix` (but on the contents rather than the axis labels) .. ipython:: python df2 = df.copy() df2[ df2[1:4] > 0 ] = 3 df2 `DataFrame.mask` is the inverse boolean operation of `where`. .. ipython:: python df.mask(df<=0) - Enable referencing of Excel columns by their column names (:issue:`1936`) .. ipython:: python xl = ExcelFile('data/test.xls') xl.parse('Sheet1', index_col=0, parse_dates=True, parse_cols='A:D') - Added option to disable pandas-style tick locators and formatters using `series.plot(x_compat=True)` or `pandas.plot_params['x_compat'] = True` (:issue:`2205`) - Existing TimeSeries methods `at_time` and `between_time` were added to DataFrame (:issue:`2149`) - DataFrame.dot can now accept ndarrays (:issue:`2042`) - DataFrame.drop now supports non-unique indexes (:issue:`2101`) - Panel.shift now supports negative periods (:issue:`2164`) - DataFrame now support unary ~ operator (:issue:`2110`) API changes ~~~~~~~~~~~ - Upsampling data with a PeriodIndex will result in a higher frequency TimeSeries that spans the original time window .. ipython:: python prng = period_range('2012Q1', periods=2, freq='Q') s = Series(np.random.randn(len(prng)), prng) s.resample('M') - Period.end_time now returns the last nanosecond in the time interval (:issue:`2124`, :issue:`2125`, :issue:`1764`) .. ipython:: python p = Period('2012') p.end_time - File parsers no longer coerce to float or bool for columns that have custom converters specified (:issue:`2184`) .. ipython:: python data = 'A,B,C\n00001,001,5\n00002,002,6' read_csv(StringIO(data), converters={'A' : lambda x: x.strip()}) See the :ref:`full release notes ` or issue tracker on GitHub for a complete list.