forked from pandas-dev/pandas
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathv0.10.1.txt
236 lines (183 loc) · 8.79 KB
/
v0.10.1.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
.. _whatsnew_0101:
v0.10.1 (January 22, 2013)
---------------------------
This is a minor release from 0.10.0 and includes new features, enhancements,
and bug fixes. In particular, there is substantial new HDFStore functionality
contributed by Jeff Reback.
An undesired API breakage with functions taking the ``inplace`` option has been
reverted and deprecation warnings added.
API changes
~~~~~~~~~~~
- Functions taking an ``inplace`` option return the calling object as before. A
deprecation message has been added
- Groupby aggregations Max/Min no longer exclude non-numeric data (GH2700_)
- Resampling an empty DataFrame now returns an empty DataFrame instead of
raising an exception (GH2640_)
- The file reader will now raise an exception when NA values are found in an
explicitly specified integer column instead of converting the column to float
(GH2631_)
- DatetimeIndex.unique now returns a DatetimeIndex with the same name and
- timezone instead of an array (GH2563_)
New features
~~~~~~~~~~~~
- MySQL support for database (contribution from Dan Allan)
HDFStore
~~~~~~~~
You may need to upgrade your existing data files. Please visit the
**compatibility** section in the main docs.
.. ipython:: python
:suppress:
:okexcept:
os.remove('store.h5')
You can designate (and index) certain columns that you want to be able to
perform queries on a table, by passing a list to ``data_columns``
.. ipython:: python
store = HDFStore('store.h5')
df = DataFrame(randn(8, 3), index=date_range('1/1/2000', periods=8),
columns=['A', 'B', 'C'])
df['string'] = 'foo'
df.ix[4:6,'string'] = np.nan
df.ix[7:9,'string'] = 'bar'
df['string2'] = 'cool'
df
# on-disk operations
store.append('df', df, data_columns = ['B','C','string','string2'])
store.select('df',[ 'B > 0', 'string == foo' ])
# this is in-memory version of this type of selection
df[(df.B > 0) & (df.string == 'foo')]
Retrieving unique values in an indexable or data column.
.. ipython:: python
store.unique('df','index')
store.unique('df','string')
You can now store ``datetime64`` in data columns
.. ipython:: python
df_mixed = df.copy()
df_mixed['datetime64'] = Timestamp('20010102')
df_mixed.ix[3:4,['A','B']] = np.nan
store.append('df_mixed', df_mixed)
df_mixed1 = store.select('df_mixed')
df_mixed1
df_mixed1.get_dtype_counts()
You can pass ``columns`` keyword to select to filter a list of the return
columns, this is equivalent to passing a
``Term('columns',list_of_columns_to_filter)``
.. ipython:: python
store.select('df',columns = ['A','B'])
``HDFStore`` now serializes multi-index dataframes when appending tables.
.. ipython:: python
index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
['one', 'two', 'three']],
labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
[0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
names=['foo', 'bar'])
df = DataFrame(np.random.randn(10, 3), index=index,
columns=['A', 'B', 'C'])
df
store.append('mi',df)
store.select('mi')
# the levels are automatically included as data columns
store.select('mi', Term('foo=bar'))
Multi-table creation via ``append_to_multiple`` and selection via
``select_as_multiple`` can create/select from multiple tables and return a
combined result, by using ``where`` on a selector table.
.. ipython:: python
df_mt = DataFrame(randn(8, 6), index=date_range('1/1/2000', periods=8),
columns=['A', 'B', 'C', 'D', 'E', 'F'])
df_mt['foo'] = 'bar'
# you can also create the tables individually
store.append_to_multiple({ 'df1_mt' : ['A','B'], 'df2_mt' : None }, df_mt, selector = 'df1_mt')
store
# indiviual tables were created
store.select('df1_mt')
store.select('df2_mt')
# as a multiple
store.select_as_multiple(['df1_mt','df2_mt'], where = [ 'A>0','B>0' ], selector = 'df1_mt')
.. ipython:: python
:suppress:
store.close()
import os
os.remove('store.h5')
**Enhancements**
- ``HDFStore`` now can read native PyTables table format tables
- You can pass ``nan_rep = 'my_nan_rep'`` to append, to change the default nan
representation on disk (which converts to/from `np.nan`), this defaults to
`nan`.
- You can pass ``index`` to ``append``. This defaults to ``True``. This will
automagically create indicies on the *indexables* and *data columns* of the
table
- You can pass ``chunksize=an integer`` to ``append``, to change the writing
chunksize (default is 50000). This will signficantly lower your memory usage
on writing.
- You can pass ``expectedrows=an integer`` to the first ``append``, to set the
TOTAL number of expectedrows that ``PyTables`` will expected. This will
optimize read/write performance.
- ``Select`` now supports passing ``start`` and ``stop`` to provide selection
space limiting in selection.
- Greatly improved ISO8601 (e.g., yyyy-mm-dd) date parsing for file parsers (GH2698_)
- Allow ``DataFrame.merge`` to handle combinatorial sizes too large for 64-bit
integer (GH2690_)
- Series now has unary negation (-series) and inversion (~series) operators (GH2686_)
- DataFrame.plot now includes a ``logx`` parameter to change the x-axis to log scale (GH2327_)
- Series arithmetic operators can now handle constant and ndarray input (GH2574_)
- ExcelFile now takes a ``kind`` argument to specify the file type (GH2613_)
- A faster implementation for Series.str methods (GH2602_)
**Bug Fixes**
- ``HDFStore`` tables can now store ``float32`` types correctly (cannot be
mixed with ``float64`` however)
- Fixed Google Analytics prefix when specifying request segment (GH2713_).
- Function to reset Google Analytics token store so users can recover from
improperly setup client secrets (GH2687_).
- Fixed groupby bug resulting in segfault when passing in MultiIndex (GH2706_)
- Fixed bug where passing a Series with datetime64 values into `to_datetime`
results in bogus output values (GH2699_)
- Fixed bug in ``pattern in HDFStore`` expressions when pattern is not a valid
regex (GH2694_)
- Fixed performance issues while aggregating boolean data (GH2692_)
- When given a boolean mask key and a Series of new values, Series __setitem__
will now align the incoming values with the original Series (GH2686_)
- Fixed MemoryError caused by performing counting sort on sorting MultiIndex
levels with a very large number of combinatorial values (GH2684_)
- Fixed bug that causes plotting to fail when the index is a DatetimeIndex with
a fixed-offset timezone (GH2683_)
- Corrected businessday subtraction logic when the offset is more than 5 bdays
and the starting date is on a weekend (GH2680_)
- Fixed C file parser behavior when the file has more columns than data
(GH2668_)
- Fixed file reader bug that misaligned columns with data in the presence of an
implicit column and a specified `usecols` value
- DataFrames with numerical or datetime indices are now sorted prior to
plotting (GH2609_)
- Fixed DataFrame.from_records error when passed columns, index, but empty
records (GH2633_)
- Several bug fixed for Series operations when dtype is datetime64 (GH2689_,
GH2629_, GH2626_)
See the `full release notes
<https://github.com/pydata/pandas/blob/master/RELEASE.rst>`__ or issue tracker
on GitHub for a complete list.
.. _GH2706: https://github.com/pydata/pandas/issues/2706
.. _GH2700: https://github.com/pydata/pandas/issues/2700
.. _GH2699: https://github.com/pydata/pandas/issues/2699
.. _GH2698: https://github.com/pydata/pandas/issues/2698
.. _GH2694: https://github.com/pydata/pandas/issues/2694
.. _GH2692: https://github.com/pydata/pandas/issues/2692
.. _GH2690: https://github.com/pydata/pandas/issues/2690
.. _GH2713: https://github.com/pydata/pandas/issues/2713
.. _GH2689: https://github.com/pydata/pandas/issues/2689
.. _GH2686: https://github.com/pydata/pandas/issues/2686
.. _GH2684: https://github.com/pydata/pandas/issues/2684
.. _GH2683: https://github.com/pydata/pandas/issues/2683
.. _GH2680: https://github.com/pydata/pandas/issues/2680
.. _GH2668: https://github.com/pydata/pandas/issues/2668
.. _GH2640: https://github.com/pydata/pandas/issues/2640
.. _GH2609: https://github.com/pydata/pandas/issues/2609
.. _GH2327: https://github.com/pydata/pandas/issues/2327
.. _GH2574: https://github.com/pydata/pandas/issues/2574
.. _GH2609: https://github.com/pydata/pandas/issues/2609
.. _GH2631: https://github.com/pydata/pandas/issues/2631
.. _GH2633: https://github.com/pydata/pandas/issues/2633
.. _GH2629: https://github.com/pydata/pandas/issues/2629
.. _GH2626: https://github.com/pydata/pandas/issues/2626
.. _GH2613: https://github.com/pydata/pandas/issues/2613
.. _GH2602: https://github.com/pydata/pandas/issues/2602
.. _GH2687: https://github.com/pydata/pandas/issues/2687
.. _GH2563: https://github.com/pydata/pandas/issues/2563