This repository was archived by the owner on Nov 14, 2018. It is now read-only.
forked from pandas-dev/pandas
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathv0.9.1.txt
145 lines (87 loc) · 4.16 KB
/
v0.9.1.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
.. _whatsnew_0901:
.. ipython:: python
:suppress:
from pandas.compat import StringIO
v0.9.1 (November 14, 2012)
--------------------------
This is a bugfix release from 0.9.0 and includes several new features and
enhancements along with a large number of bug fixes. The new features include
by-column sort order for DataFrame and Series, improved NA handling for the rank
method, masking functions for DataFrame, and intraday time-series filtering for
DataFrame.
New features
~~~~~~~~~~~~
- `Series.sort`, `DataFrame.sort`, and `DataFrame.sort_index` can now be
specified in a per-column manner to support multiple sort orders (:issue:`928`)
.. ipython:: python
df = DataFrame(np.random.randint(0, 2, (6, 3)), columns=['A', 'B', 'C'])
df.sort(['A', 'B'], ascending=[1, 0])
- `DataFrame.rank` now supports additional argument values for the
`na_option` parameter so missing values can be assigned either the largest
or the smallest rank (:issue:`1508`, :issue:`2159`)
.. ipython:: python
df = DataFrame(np.random.randn(6, 3), columns=['A', 'B', 'C'])
df.ix[2:4] = np.nan
df.rank()
df.rank(na_option='top')
df.rank(na_option='bottom')
- DataFrame has new `where` and `mask` methods to select values according to a
given boolean mask (:issue:`2109`, :issue:`2151`)
DataFrame currently supports slicing via a boolean vector the same length as the DataFrame (inside the `[]`).
The returned DataFrame has the same number of columns as the original, but is sliced on its index.
.. ipython:: python
df = DataFrame(np.random.randn(5, 3), columns = ['A','B','C'])
df
df[df['A'] > 0]
If a DataFrame is sliced with a DataFrame based boolean condition (with the same size as the original DataFrame),
then a DataFrame the same size (index and columns) as the original is returned, with
elements that do not meet the boolean condition as `NaN`. This is accomplished via
the new method `DataFrame.where`. In addition, `where` takes an optional `other` argument for replacement.
.. ipython:: python
df[df>0]
df.where(df>0)
df.where(df>0,-df)
Furthermore, `where` now aligns the input boolean condition (ndarray or DataFrame), such that partial selection
with setting is possible. This is analagous to partial setting via `.ix` (but on the contents rather than the axis labels)
.. ipython:: python
df2 = df.copy()
df2[ df2[1:4] > 0 ] = 3
df2
`DataFrame.mask` is the inverse boolean operation of `where`.
.. ipython:: python
df.mask(df<=0)
- Enable referencing of Excel columns by their column names (:issue:`1936`)
.. ipython:: python
xl = ExcelFile('data/test.xls')
xl.parse('Sheet1', index_col=0, parse_dates=True,
parse_cols='A:D')
- Added option to disable pandas-style tick locators and formatters
using `series.plot(x_compat=True)` or `pandas.plot_params['x_compat'] =
True` (:issue:`2205`)
- Existing TimeSeries methods `at_time` and `between_time` were added to
DataFrame (:issue:`2149`)
- DataFrame.dot can now accept ndarrays (:issue:`2042`)
- DataFrame.drop now supports non-unique indexes (:issue:`2101`)
- Panel.shift now supports negative periods (:issue:`2164`)
- DataFrame now support unary ~ operator (:issue:`2110`)
API changes
~~~~~~~~~~~
- Upsampling data with a PeriodIndex will result in a higher frequency
TimeSeries that spans the original time window
.. ipython:: python
prng = period_range('2012Q1', periods=2, freq='Q')
s = Series(np.random.randn(len(prng)), prng)
s.resample('M')
- Period.end_time now returns the last nanosecond in the time interval
(:issue:`2124`, :issue:`2125`, :issue:`1764`)
.. ipython:: python
p = Period('2012')
p.end_time
- File parsers no longer coerce to float or bool for columns that have custom
converters specified (:issue:`2184`)
.. ipython:: python
data = 'A,B,C\n00001,001,5\n00002,002,6'
read_csv(StringIO(data), converters={'A' : lambda x: x.strip()})
See the :ref:`full release notes
<release>` or issue tracker
on GitHub for a complete list.