.. _whatsnew_0130:

v0.13.0 (October ??, 2013)
--------------------------

This is a major release from 0.12.0 and includes a number of API changes, several new features and
enhancements along with a large number of bug fixes.

Highlights include support for a new index type ``Float64Index``, support for new methods of interpolation, updated ``timedelta`` operations, and a new string manipulation method ``extract``.
Several experimental features are added, including new ``eval/query`` methods for expression evaluation, support for ``msgpack`` serialization,
and an io interface to Google's ``BigQuery``.

.. warning::

   In 0.13.0 ``Series`` has internally been refactored to no longer sub-class ``ndarray``
   but instead subclass ``NDFrame``, similar to the rest of the pandas containers. This should be
   a transparent change with only very limited API implications. See :ref:`Internal Refactoring<whatsnew_0130.refactoring>`

API changes
~~~~~~~~~~~

- ``read_excel`` now supports an integer in its ``sheetname`` argument giving
  the index of the sheet to read in (:issue:`4301`).
- Text parser now treats anything that reads like inf ("inf", "Inf", "-Inf",
  "iNf", etc.) as infinity. (:issue:`4220`, :issue:`4219`), affecting
  ``read_table``, ``read_csv``, etc.
- ``pandas`` now is Python 2/3 compatible without the need for 2to3 thanks to
  @jtratner. As a result, pandas now uses iterators more extensively. This
  also led to the introduction of substantive parts of the Benjamin
  Peterson's ``six`` library into compat. (:issue:`4384`, :issue:`4375`,
  :issue:`4372`)
- ``pandas.util.compat`` and ``pandas.util.py3compat`` have been merged into
  ``pandas.compat``. ``pandas.compat`` now includes many functions allowing
  2/3 compatibility. It contains both list and iterator versions of range,
  filter, map and zip, plus other necessary elements for Python 3
  compatibility. ``lmap``, ``lzip``, ``lrange`` and ``lfilter`` all produce
  lists instead of iterators, for compatibility with ``numpy``, subscripting
  and ``pandas`` constructors.(:issue:`4384`, :issue:`4375`, :issue:`4372`)
- ``Series.get`` with negative indexers now returns the same as ``[]`` (:issue:`4390`)
- Changes to how ``Index`` and ``MultiIndex`` handle metadata (``levels``,
  ``labels``, and ``names``) (:issue:`4039`):

  .. code-block:: python

      # previously, you would have set levels or labels directly
      index.levels = [[1, 2, 3, 4], [1, 2, 4, 4]]

      # now, you use the set_levels or set_labels methods
      index = index.set_levels([[1, 2, 3, 4], [1, 2, 4, 4]])

      # similarly, for names, you can rename the object
      # but setting names is not deprecated.
      index = index.set_names(["bob", "cranberry"])

      # and all methods take an inplace kwarg - but returns None
      index.set_names(["bob", "cranberry"], inplace=True)

- **All** division with ``NDFrame`` - likes is now truedivision, regardless
  of the future import. You can use ``//`` and ``floordiv`` to do integer
  division.

  .. code-block:: python

      In [3]: arr = np.array([1, 2, 3, 4])

      In [4]: arr2 = np.array([5, 3, 2, 1])

      In [5]: arr / arr2
      Out[5]: array([0, 0, 1, 4])

      In [6]: pd.Series(arr) / pd.Series(arr2) # no future import required
      Out[6]:
      0    0.200000
      1    0.666667
      2    1.500000
      3    4.000000
      dtype: float64

- Infer and downcast dtype if ``downcast='infer'`` is passed to ``fillna/ffill/bfill`` (:issue:`4604`)
- ``__nonzero__`` for all NDFrame objects, will now raise a ``ValueError``, this reverts back to (:issue:`1073`, :issue:`4633`)
  behavior. See :ref:`gotchas<gotchas.truth>` for a more detailed discussion.

  This prevents doing boolean comparision on *entire* pandas objects, which is inherently ambiguous. These all will raise a ``ValueError``.

  .. code-block:: python

      if df:
         ....
      df1 and df2
      s1 and s2

  Added the ``.bool()`` method to ``NDFrame`` objects to facilitate evaluating of single-element boolean Series:

  .. ipython:: python

     Series([True]).bool()
     Series([False]).bool()
     DataFrame([[True]]).bool()
     DataFrame([[False]]).bool()

- All non-Index NDFrames (``Series``, ``DataFrame``, ``Panel``, ``Panel4D``,
  ``SparsePanel``, etc.), now support the entire set of arithmetic operators
  and arithmetic flex methods (add, sub, mul, etc.). ``SparsePanel`` does not
  support ``pow`` or ``mod`` with non-scalars. (:issue:`3765`)
- ``Series`` and ``DataFrame`` now have a ``mode()`` method to calculate the
  statistical mode(s) by axis/Series. (:issue:`5367`)

- Chained assignment will now by default warn if the user is assigning to a copy. This can be changed
  with he option ``mode.chained_assignment``, allowed options are ``raise/warn/None``. See :ref:`the docs<indexing.view_versus_copy>`.

  .. ipython:: python

     dfc = DataFrame({'A':['aaa','bbb','ccc'],'B':[1,2,3]})
     pd.set_option('chained_assignment','warn')

  The following warning / exception will show if this is attempted.

  .. ipython:: python

     dfc.loc[0]['A'] = 1111

  ::

     Traceback (most recent call last)
        ...
     SettingWithCopyWarning:
        A value is trying to be set on a copy of a slice from a DataFrame.
        Try using .loc[row_index,col_indexer] = value instead

  Here is the correct method of assignment.

  .. ipython:: python

     dfc.loc[0,'A'] = 11
     dfc

Prior Version Deprecations/Changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

These were announced changes in 0.12 or prior that are taking effect as of 0.13.0

- Remove deprecated ``Factor`` (:issue:`3650`)
- Remove deprecated ``set_printoptions/reset_printoptions`` (:issue:`3046`)
- Remove deprecated ``_verbose_info`` (:issue:`3215`)
- Remove deprecated ``read_clipboard/to_clipboard/ExcelFile/ExcelWriter`` from ``pandas.io.parsers`` (:issue:`3717`)
- default for ``tupleize_cols`` is now ``False`` for both ``to_csv`` and ``read_csv``. Fair warning in 0.12 (:issue:`3604`)

Deprecations
~~~~~~~~~~~~

Deprecated in 0.13.0

- deprecated ``iterkv``, which will be removed in a future release (this was
  an alias of iteritems used to bypass ``2to3``'s changes).
  (:issue:`4384`, :issue:`4375`, :issue:`4372`)
- deprecated the string method ``match``, whose role is now performed more
  idiomatically by ``extract``. In a future release, the default behavior
  of ``match`` will change to become analogous to ``contains``, which returns
  a boolean indexer. (Their
  distinction is strictness: ``match`` relies on ``re.match`` while
  ``contains`` relies on ``re.serach``.) In this release, the deprecated
  behavior is the default, but the new behavior is available through the
  keyword argument ``as_indexer=True``.

Indexing API Changes
~~~~~~~~~~~~~~~~~~~~

Prior to 0.13, it was impossible to use a label indexer (``.loc/.ix``) to set a value that
was not contained in the index of a particular axis. (:issue:`2578`). See more :ref:`the docs<indexing.basics.partial_setting>`

In the ``Series`` case this is effectively an appending operation

.. ipython:: python

   s = Series([1,2,3])
   s
   s[5] = 5.
   s

.. ipython:: python

   dfi = DataFrame(np.arange(6).reshape(3,2),
                   columns=['A','B'])
   dfi

This would previously ``KeyError``

.. ipython:: python

   dfi.loc[:,'C'] = dfi.loc[:,'A']
   dfi

This is like an ``append`` operation.

.. ipython:: python

   dfi.loc[3] = 5
   dfi

A Panel setting operation on an arbitrary axis aligns the input to the Panel

.. ipython:: python

   p = pd.Panel(np.arange(16).reshape(2,4,2),
               items=['Item1','Item2'],
               major_axis=pd.date_range('2001/1/12',periods=4),
               minor_axis=['A','B'],dtype='float64')
   p
   p.loc[:,:,'C'] = Series([30,32],index=p.items)
   p
   p.loc[:,:,'C']

Float64Index API Change
~~~~~~~~~~~~~~~~~~~~~~~

- Added a new index type, ``Float64Index``. This will be automatically created when passing floating values in index creation.
  This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the
  same. See :ref:`the docs<indexing.float64index>`, (:issue:`263`)

  Construction is by default for floating type values.

  .. ipython:: python

     index = Index([1.5, 2, 3, 4.5, 5])
     index
     s = Series(range(5),index=index)
     s

  Scalar selection for ``[],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)

  .. ipython:: python

     s[3]
     s.ix[3]
     s.loc[3]

  The only positional indexing is via ``iloc``

  .. ipython:: python

     s.iloc[3]

  A scalar index that is not found will raise ``KeyError``

  Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS positional with ``iloc``

  .. ipython:: python

     s[2:4]
     s.ix[2:4]
     s.loc[2:4]
     s.iloc[2:4]

  In float indexes, slicing using floats are allowed

  .. ipython:: python

     s[2.1:4.6]
     s.loc[2.1:4.6]

- Indexing on other index types are preserved (and positional fallback for ``[],ix``), with the exception, that floating point slicing
  on indexes on non ``Float64Index`` will now raise a ``TypeError``.

  .. code-block:: python

     In [1]: Series(range(5))[3.5]
     TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index)

     In [1]: Series(range(5))[3.5:4.5]
     TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index)

  Using a scalar float indexer will be deprecated in a future version, but is allowed for now.

  .. code-block:: python

     In [3]: Series(range(5))[3.0]
     Out[3]: 3

HDFStore API Changes
~~~~~~~~~~~~~~~~~~~~

- Query Format Changes. A much more string-like query format is now supported. See :ref:`the docs<io.hdf5-query>`.

  .. ipython:: python

     path = 'test.h5'
     dfq = DataFrame(randn(10,4),
              columns=list('ABCD'),
              index=date_range('20130101',periods=10))
     dfq.to_hdf(path,'dfq',format='table',data_columns=True)

  Use boolean expressions, with in-line function evaluation.

  .. ipython:: python

     read_hdf(path,'dfq',
         where="index>Timestamp('20130104') & columns=['A', 'B']")

  Use an inline column reference

  .. ipython:: python

     read_hdf(path,'dfq',
         where="A>0 or C>0")

  .. ipython:: python
     :suppress:

     import os
     os.remove(path)

- the ``format`` keyword now replaces the ``table`` keyword; allowed values are ``fixed(f)`` or ``table(t)``
  the same defaults as prior < 0.13.0 remain, e.g. ``put`` implies ``fixed`` format
  and ``append`` imples ``table`` format. This default format can be set as an option by setting ``io.hdf.default_format``.

  .. ipython:: python

     path = 'test.h5'
     df = DataFrame(randn(10,2))
     df.to_hdf(path,'df_table',format='table')
     df.to_hdf(path,'df_table2',append=True)
     df.to_hdf(path,'df_fixed')
     with get_store(path) as store:
        print(store)

  .. ipython:: python
     :suppress:

     import os
     os.remove(path)

- Significant table writing performance improvements
- handle a passed ``Series`` in table format (:issue:`4330`)
- can now serialize a ``timedelta64[ns]`` dtype in a table (:issue:`3577`), See :ref:`the docs<io.hdf5-timedelta>`.
- added an ``is_open`` property to indicate if the underlying file handle is_open;
  a closed store will now report 'CLOSED' when viewing the store (rather than raising an error)
  (:issue:`4409`)
- a close of a ``HDFStore`` now will close that instance of the ``HDFStore``
  but will only close the actual file if the ref count (by ``PyTables``) w.r.t. all of the open handles
  are 0. Essentially you have a local instance of ``HDFStore`` referenced by a variable. Once you
  close it, it will report closed. Other references (to the same file) will continue to operate
  until they themselves are closed. Performing an action on a closed file will raise
  ``ClosedFileError``

  .. ipython:: python

     path = 'test.h5'
     df = DataFrame(randn(10,2))
     store1 = HDFStore(path)
     store2 = HDFStore(path)
     store1.append('df',df)
     store2.append('df2',df)

     store1
     store2
     store1.close()
     store2
     store2.close()
     store2

  .. ipython:: python
     :suppress:

     import os
     os.remove(path)

- removed the ``_quiet`` attribute, replace by a ``DuplicateWarning`` if retrieving
  duplicate rows from a table (:issue:`4367`)
- removed the ``warn`` argument from ``open``. Instead a ``PossibleDataLossError`` exception will
  be raised if you try to use ``mode='w'`` with an OPEN file handle (:issue:`4367`)
- allow a passed locations array or mask as a ``where`` condition (:issue:`4467`).
  See :ref:`the docs<io.hdf5-where_mask>` for an example.
- add the keyword ``dropna=True`` to ``append`` to change whether ALL nan rows are not written
  to the store (default is ``True``, ALL nan rows are NOT written), also settable
  via the option ``io.hdf.dropna_table`` (:issue:`4625`)

Enhancements
~~~~~~~~~~~~

- ``read_html`` now raises a ``URLError`` instead of catching and raising a
  ``ValueError`` (:issue:`4303`, :issue:`4305`)
- Added a test for ``read_clipboard()`` and ``to_clipboard()`` (:issue:`4282`)
- Clipboard functionality now works with PySide (:issue:`4282`)
- Added a more informative error message when plot arguments contain
  overlapping color and style arguments (:issue:`4402`)
- ``to_dict`` now takes ``records`` as a possible outtype.  Returns an array
  of column-keyed dictionaries. (:issue:`4936`)

- ``NaN`` handing in get_dummies (:issue:`4446`) with `dummy_na`

  .. ipython:: python

     # previously, nan was erroneously counted as 2 here
     # now it is not counted at all
     get_dummies([1, 2, np.nan])

     # unless requested
     get_dummies([1, 2, np.nan], dummy_na=True)


- ``timedelta64[ns]`` operations. See :ref:`the docs<timeseries.timedeltas_convert>` for the docs.

  .. warning::

     Most of these operations require ``numpy >= 1.7``

  Using the new top-level ``to_timedelta``, you can convert a scalar or array from the standard
  timedelta format (produced by ``to_csv``) into a timedelta type (``np.timedelta64`` in ``nanoseconds``).

  .. ipython:: python

     to_timedelta('1 days 06:05:01.00003')
     to_timedelta('15.5us')
     to_timedelta(['1 days 06:05:01.00003','15.5us','nan'])
     to_timedelta(np.arange(5),unit='s')
     to_timedelta(np.arange(5),unit='d')

  A Series of dtype ``timedelta64[ns]`` can now be divided by another
  ``timedelta64[ns]`` object, or astyped to yield a ``float64`` dtyped Series. This
  is frequency conversion. See :ref:`the docs<timeseries.timedeltas_convert>` for the docs.

  .. ipython:: python

     from datetime import timedelta
     td = Series(date_range('20130101',periods=4))-Series(date_range('20121201',periods=4))
     td[2] += np.timedelta64(timedelta(minutes=5,seconds=3))
     td[3] = np.nan
     td

     # to days
     td / np.timedelta64(1,'D')
     td.astype('timedelta64[D]')

     # to seconds
     td / np.timedelta64(1,'s')
     td.astype('timedelta64[s]')

  Dividing or multiplying a ``timedelta64[ns]`` Series by an integer or integer Series

  .. ipython:: python

     td * -1
     td * Series([1,2,3,4])

  Absolute ``DateOffset`` objects can act equivalenty to ``timedeltas``

  .. ipython:: python

     from pandas import offsets
     td + offsets.Minute(5) + offsets.Milli(5)

  Fillna is now supported for timedeltas

  .. ipython:: python

     td.fillna(0)
     td.fillna(timedelta(days=1,seconds=5))

  You can do numeric reduction operations on timedeltas. Note that these will return
  a single-element Series.

  .. ipython:: python

     td.mean()
     td.quantile(.1)

- ``plot(kind='kde')`` now accepts the optional parameters ``bw_method`` and
  ``ind``, passed to scipy.stats.gaussian_kde() (for scipy >= 0.11.0) to set
  the bandwidth, and to gkde.evaluate() to specify the indicies at which it
  is evaluated, respecttively. See scipy docs. (:issue:`4298`)

- DataFrame constructor now accepts a numpy masked record array (:issue:`3478`)

- The new vectorized string method ``extract`` return regular expression
  matches more conveniently.

  .. ipython:: python

     Series(['a1', 'b2', 'c3']).str.extract('[ab](\d)')

  Elements that do not match return ``NaN``. Extracting a regular expression
  with more than one group returns a DataFrame with one column per group.


  .. ipython:: python

     Series(['a1', 'b2', 'c3']).str.extract('([ab])(\d)')

  Elements that do not match return a row of ``NaN``.
  Thus, a Series of messy strings can be *converted* into a
  like-indexed Series or DataFrame of cleaned-up or more useful strings,
  without necessitating ``get()`` to access tuples or ``re.match`` objects.

  Named groups like

  .. ipython:: python

     Series(['a1', 'b2', 'c3']).str.extract(
             '(?P<letter>[ab])(?P<digit>\d)')

  and optional groups can also be used.

  .. ipython:: python

      Series(['a1', 'b2', '3']).str.extract(
              '(?P<letter>[ab])?(?P<digit>\d)')

- ``read_stata`` now accepts Stata 13 format (:issue:`4291`)

- ``read_fwf`` now infers the column specifications from the first 100 rows of
  the file if the data has correctly separated and properly aligned columns
  using the delimiter provided to the function (:issue:`4488`).

- support for nanosecond times as an offset

  .. warning::

     These operations require ``numpy >= 1.7``

  Period conversions in the range of seconds and below were reworked and extended
  up to nanoseconds. Periods in the nanosecond range are now available.

  .. ipython:: python

     date_range('2013-01-01', periods=5, freq='5N')

  or with frequency as offset

  .. ipython:: python

     date_range('2013-01-01', periods=5, freq=pd.offsets.Nano(5))

  Timestamps can be modified in the nanosecond range

  .. ipython:: python

     t = Timestamp('20130101 09:01:02')
     t + pd.datetools.Nano(123)

- A new method, ``isin`` for DataFrames, which plays nicely with boolean indexing. The argument to ``isin``, what we're comparing the DataFrame to, can be a DataFrame, Series, dict, or array of values. See :ref:`the docs<indexing.basics.indexing_isin>` for more.

  To get the rows where any of the conditions are met:

  .. ipython:: python

     dfi = DataFrame({'A': [1, 2, 3, 4], 'B': ['a', 'b', 'f', 'n']})
     dfi
     other = DataFrame({'A': [1, 3, 3, 7], 'B': ['e', 'f', 'f', 'e']})
     mask = dfi.isin(other)
     mask
     dfi[mask.any(1)]

- ``Series`` now supports a ``to_frame`` method to convert it to a single-column DataFrame (:issue:`5164`)

- All R datasets listed here http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html can now be loaded into Pandas objects

  .. code-block:: python

     import pandas.rpy.common as com
     com.load_data('Titanic')

- ``tz_localize`` can infer a fall daylight savings transition based on the structure
  of the unlocalized data (:issue:`4230`), see :ref:`the docs<timeseries.timezone>`

- DatetimeIndex is now in the API documentation, see :ref:`the docs<api.datetimeindex>`

- :meth:`~pandas.io.json.json_normalize` is a new method to allow you to create a flat table
  from semi-structured JSON data. See :ref:`the docs<io.json_normalize>` (:issue:`1067`)

- Added PySide support for the qtpandas DataFrameModel and DataFrameWidget.

- Python csv parser now supports usecols (:issue:`4335`)

- DataFrame has a new ``interpolate`` method, similar to Series (:issue:`4434`, :issue:`1892`)
- Added ``LastWeekOfMonth`` DateOffset (:issue:`4637`)
- Added ``FY5253``, and ``FY5253Quarter`` DateOffsets (:issue:`4511`)


  .. ipython:: python

      df = DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
                      'B': [.25, np.nan, np.nan, 4, 12.2, 14.4]})
      df.interpolate()

  Additionally, the ``method`` argument to ``interpolate`` has been expanded
  to include ``'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
  'barycentric', 'krogh', 'piecewise_polynomial', 'pchip', `polynomial`, 'spline'``
  The new methods require scipy_. Consult the Scipy reference guide_ and documentation_ for more information
  about when the various methods are appropriate. See :ref:`the docs<missing_data.interpolate>`.

  Interpolate now also accepts a ``limit`` keyword argument.
  This works similar to ``fillna``'s limit:

  .. ipython:: python

    ser = Series([1, 3, np.nan, np.nan, np.nan, 11])
    ser.interpolate(limit=2)

.. _scipy: http://www.scipy.org
.. _documentation: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation
.. _guide: http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html

- ``to_csv`` now takes a ``date_format`` keyword argument that specifies how
  output datetime objects should be formatted. Datetimes encountered in the
  index, columns, and values will all have this formatting applied. (:issue:`4313`)
- ``DataFrame.plot`` will scatter plot x versus y by passing ``kind='scatter'`` (:issue:`2215`)

.. _whatsnew_0130.experimental:

Experimental
~~~~~~~~~~~~

- The new :func:`~pandas.eval` function implements expression evaluation using
  ``numexpr`` behind the scenes. This results in large speedups for
  complicated expressions involving large DataFrames/Series. For example,

  .. ipython:: python

     nrows, ncols = 20000, 100
     df1, df2, df3, df4 = [DataFrame(randn(nrows, ncols))
                           for _ in xrange(4)]

  .. ipython:: python

     # eval with NumExpr backend
     %timeit pd.eval('df1 + df2 + df3 + df4')

  .. ipython:: python

     # pure Python evaluation
     %timeit df1 + df2 + df3 + df4

  For more details, see the :ref:`the docs<enhancingperf.eval>`

- Similar to ``pandas.eval``, :class:`~pandas.DataFrame` has a new
  ``DataFrame.eval`` method that evaluates an expression in the context of
  the ``DataFrame``. For example,

  .. ipython:: python
     :suppress:

     try:
        del a
     except NameError:
        pass

     try:
        del b
     except NameError:
        pass

  .. ipython:: python

     df = DataFrame(randn(10, 2), columns=['a', 'b'])
     df.eval('a + b')

- :meth:`~pandas.DataFrame.query` method has been added that allows
  you to select elements of a ``DataFrame`` using a natural query syntax
  nearly identical to Python syntax. For example,

  .. ipython:: python
     :suppress:

     try:
        del a
     except NameError:
        pass

     try:
        del b
     except NameError:
        pass

     try:
        del c
     except NameError:
        pass

  .. ipython:: python

     n = 20
     df = DataFrame(np.random.randint(n, size=(n, 3)), columns=['a', 'b', 'c'])
     df.query('a < b < c')

  selects all the rows of ``df`` where ``a < b < c`` evaluates to ``True``.
  For more details see the :ref:`the docs<indexing.query>`.

- ``pd.read_msgpack()`` and ``pd.to_msgpack()`` are now a supported method of serialization
  of arbitrary pandas (and python objects) in a lightweight portable binary format. See :ref:`the docs<io.msgpack>`

  .. warning::

     Since this is an EXPERIMENTAL LIBRARY, the storage format may not be stable until a future release.

  .. ipython:: python

     df = DataFrame(np.random.rand(5,2),columns=list('AB'))
     df.to_msgpack('foo.msg')
     pd.read_msgpack('foo.msg')

     s = Series(np.random.rand(5),index=date_range('20130101',periods=5))
     pd.to_msgpack('foo.msg', df, s)
     pd.read_msgpack('foo.msg')

  You can pass ``iterator=True`` to iterator over the unpacked results

  .. ipython:: python

     for o in pd.read_msgpack('foo.msg',iterator=True):
        print o

  .. ipython:: python
     :suppress:
     :okexcept:

     os.remove('foo.msg')

- ``pandas.io.gbq`` provides a simple way to extract from, and load data into,
  Google's BigQuery Data Sets by way of pandas DataFrames. BigQuery is a high
  performance SQL-like database service, useful for performing ad-hoc queries
  against extremely large datasets. :ref:`See the docs <io.bigquery>`

  .. code-block:: python

     from pandas.io import gbq

     # A query to select the average monthly temperatures in the
     # in the year 2000 across the USA. The dataset,
     # publicata:samples.gsod, is available on all BigQuery accounts,
     # and is based on NOAA gsod data.

     query = """SELECT station_number as STATION,
     month as MONTH, AVG(mean_temp) as MEAN_TEMP
     FROM publicdata:samples.gsod
     WHERE YEAR = 2000
     GROUP BY STATION, MONTH
     ORDER BY STATION, MONTH ASC"""

     # Fetch the result set for this query

     # Your Google BigQuery Project ID
     # To find this, see your dashboard:
     # https://code.google.com/apis/console/b/0/?noredirect
     projectid = xxxxxxxxx;

     df = gbq.read_gbq(query, project_id = projectid)

     # Use pandas to process and reshape the dataset

     df2 = df.pivot(index='STATION', columns='MONTH', values='MEAN_TEMP')
     df3 = pandas.concat([df2.min(), df2.mean(), df2.max()],
                         axis=1,keys=["Min Tem", "Mean Temp", "Max Temp"])

  The resulting dataframe is::

     > df3
                 Min Tem  Mean Temp    Max Temp
      MONTH
      1     -53.336667  39.827892   89.770968
      2     -49.837500  43.685219   93.437932
      3     -77.926087  48.708355   96.099998
      4     -82.892858  55.070087   97.317240
      5     -92.378261  61.428117  102.042856
      6     -77.703334  65.858888  102.900000
      7     -87.821428  68.169663  106.510714
      8     -89.431999  68.614215  105.500000
      9     -86.611112  63.436935  107.142856
      10    -78.209677  56.880838   92.103333
      11    -50.125000  48.861228   94.996428
      12    -50.332258  42.286879   94.396774

  .. warning::

     To use this module, you will need a BigQuery account. See
     <https://cloud.google.com/products/big-query> for details.

     As of 10/10/13, there is a bug in Google's API preventing result sets
     from being larger than 100,000 rows. A patch is scheduled for the week of
     10/14/13.

.. _whatsnew_0130.refactoring:

Internal Refactoring
~~~~~~~~~~~~~~~~~~~~

In 0.13.0 there is a major refactor primarily to subclass ``Series`` from
``NDFrame``, which is the base class currently for ``DataFrame`` and ``Panel``,
to unify methods and behaviors. Series formerly subclassed directly from
``ndarray``. (:issue:`4080`, :issue:`3862`, :issue:`816`)

.. warning::

   There are two potential incompatibilities from < 0.13.0

   - Using certain numpy functions would previously return a ``Series`` if passed a ``Series``
     as an argument. This seems only to affect ``np.ones_like``, ``np.empty_like``,
     ``np.diff`` and ``np.where``. These now return ``ndarrays``.

     .. ipython:: python

        s = Series([1,2,3,4])

     Numpy Usage

     .. ipython:: python

        np.ones_like(s)
        np.diff(s)
        np.where(s>1,s,np.nan)

     Pandonic Usage

     .. ipython:: python

        Series(1,index=s.index)
        s.diff()
        s.where(s>1)

   - Passing a ``Series`` directly to a cython function expecting an ``ndarray`` type will no
     long work directly, you must pass ``Series.values``, See :ref:`Enhancing Performance<enhancingperf.ndarray>`

   - ``Series(0.5)`` would previously return the scalar ``0.5``, instead this will return a 1-element ``Series``

- Refactor of series.py/frame.py/panel.py to move common code to generic.py

  - added ``_setup_axes`` to created generic NDFrame structures
  - moved methods

    - ``from_axes,_wrap_array,axes,ix,loc,iloc,shape,empty,swapaxes,transpose,pop``
    - ``__iter__,keys,__contains__,__len__,__neg__,__invert__``
    - ``convert_objects,as_blocks,as_matrix,values``
    - ``__getstate__,__setstate__`` (compat remains in frame/panel)
    - ``__getattr__,__setattr__``
    - ``_indexed_same,reindex_like,align,where,mask``
    - ``fillna,replace`` (``Series`` replace is now consistent with ``DataFrame``)
    - ``filter`` (also added axis argument to selectively filter on a different axis)
    - ``reindex,reindex_axis,take``
    - ``truncate`` (moved to become part of ``NDFrame``)

- These are API changes which make ``Panel`` more consistent with ``DataFrame``

  - ``swapaxes`` on a ``Panel`` with the same axes specified now return a copy
  - support attribute access for setting
  - filter supports same api as original ``DataFrame`` filter

- Reindex called with no arguments will now return a copy of the input object

- Series now inherits from ``NDFrame`` rather than directly from ``ndarray``.
  There are several minor changes that affect the API.

  - numpy functions that do not support the array interface will now
    return ``ndarrays`` rather than series, e.g. ``np.diff`` and ``np.ones_like``
  - ``Series(0.5)`` would previously return the scalar ``0.5``, this is no
    longer supported
  - ``TimeSeries`` is now an alias for ``Series``. the property ``is_time_series``
    can be used to distinguish (if desired)

- Refactor of Sparse objects to use BlockManager

  - Created a new block type in internals, ``SparseBlock``, which can hold multi-dtypes
    and is non-consolidatable. ``SparseSeries`` and ``SparseDataFrame`` now inherit
    more methods from there hierarchy (Series/DataFrame), and no longer inherit
    from ``SparseArray`` (which instead is the object of the ``SparseBlock``)
  - Sparse suite now supports integration with non-sparse data. Non-float sparse
    data is supportable (partially implemented)
  - Operations on sparse structures within DataFrames should preserve sparseness,
    merging type operations will convert to dense (and back to sparse), so might
    be somewhat inefficient
  - enable setitem on ``SparseSeries`` for boolean/integer/slices
  - ``SparsePanels`` implementation is unchanged (e.g. not using BlockManager, needs work)

- added ``ftypes`` method to Series/DataFame, similar to ``dtypes``, but indicates
  if the underlying is sparse/dense (as well as the dtype)
- All ``NDFrame`` objects now have a ``_prop_attributes``, which can be used to indcated various
  values to propogate to a new object from an existing (e.g. name in ``Series`` will follow
  more automatically now)
- Internal type checking is now done via a suite of generated classes, allowing ``isinstance(value, klass)``
  without having to directly import the klass, courtesy of @jtratner
- Bug in Series update where the parent frame is not updating its cache based on
  changes (:issue:`4080`) or types (:issue:`3217`), fillna (:issue:`3386`)
- Indexing with dtype conversions fixed (:issue:`4463`, :issue:`4204`)
- Refactor ``Series.reindex`` to core/generic.py (:issue:`4604`, :issue:`4618`), allow ``method=`` in reindexing
  on a Series to work
- ``Series.copy`` no longer accepts the ``order`` parameter and is now consistent with ``NDFrame`` copy
- Refactor ``rename`` methods to core/generic.py; fixes ``Series.rename`` for (:issue:`4605`), and adds ``rename``
  with the same signature for ``Panel``
- Refactor ``clip`` methods to core/generic.py (:issue:`4798`)
- Refactor of ``_get_numeric_data/_get_bool_data`` to core/generic.py, allowing Series/Panel functionaility
- ``Series`` (for index) / ``Panel`` (for items) now allow attribute access to its elements  (:issue:`1903`)

  .. ipython:: python

     s = Series([1,2,3],index=list('abc'))
     s.b
     s.a = 5
     s

Bug Fixes
~~~~~~~~~

See :ref:`V0.13.0 Bug Fixes<release.bug_fixes-0.13.0>` for an extensive list of bugs that have been fixed in 0.13.0.

See the :ref:`full release notes
<release>` or issue tracker
on GitHub for a complete list of all API changes, Enhancements and Bug Fixes.