============================================= pandas: powerful Python data analysis toolkit ============================================= .. image:: https://travis-ci.org/pydata/pandas.png :target: https://travis-ci.org/pydata/pandas What is it ========== **pandas** is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, **real world** data analysis in Python. Additionally, it has the broader goal of becoming **the most powerful and flexible open source data analysis / manipulation tool available in any language**. It is already well on its way toward this goal. Main Features ============= Here are just a few of the things that pandas does well: - Easy handling of **missing data** (represented as NaN) in floating point as well as non-floating point data - Size mutability: columns can be **inserted and deleted** from DataFrame and higher dimensional objects - Automatic and explicit **data alignment**: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let `Series`, `DataFrame`, etc. automatically align the data for you in computations - Powerful, flexible **group by** functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data - Make it **easy to convert** ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects - Intelligent label-based **slicing**, **fancy indexing**, and **subsetting** of large data sets - Intuitive **merging** and **joining** data sets - Flexible **reshaping** and pivoting of data sets - **Hierarchical** labeling of axes (possible to have multiple labels per tick) - Robust IO tools for loading data from **flat files** (CSV and delimited), Excel files, databases, and saving / loading data from the ultrafast **HDF5 format** - **Time series**-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc. Where to get it =============== The source code is currently hosted on GitHub at: http://github.com/pydata/pandas Binary installers for the latest released version are available at the Python package index:: http://pypi.python.org/pypi/pandas/ And via ``easy_install`` or ``pip``:: easy_install pandas pip install pandas Dependencies ============ * `NumPy `__: 1.6.1 or higher * `python-dateutil `__ 1.5 or higher * `pytz `__ * Needed for time zone support with ``date_range`` Optional dependencies ~~~~~~~~~~~~~~~~~~~~~ * `Cython `__: Only necessary to build development version. Version 0.17.1 or higher. * `SciPy `__: miscellaneous statistical functions * `PyTables `__: necessary for HDF5-based storage * `matplotlib `__: for plotting * `statsmodels `__ * Needed for parts of :mod:`pandas.stats` * `openpyxl `__, `xlrd/xlwt `__ * openpyxl version 1.6.1 or higher * Needed for Excel I/O Installation from sources ========================= In the ``pandas`` directory (same one where you found this file), execute:: python setup.py install On Windows, you will need to install MinGW and execute:: python setup.py build --compiler=mingw32 python setup.py install See http://pandas.pydata.org/ for more information. License ======= BSD Documentation ============= The official documentation is hosted on PyData.org: http://pandas.pydata.org/ The Sphinx documentation should provide a good starting point for learning how to use the library. Expect the docs to continue to expand as time goes on. Background ========== Work on ``pandas`` started at AQR (a quantitative hedge fund) in 2008 and has been under active development since then. Discussion and Development ========================== Since ``pandas`` development is related to a number of other scientific Python projects, questions are welcome on the scipy-user mailing list. Specialized discussions or design issues should take place on the pystatsmodels mailing list / Google group, where ``scikits.statsmodels`` and other libraries will also be discussed: http://groups.google.com/group/pystatsmodels .. _NumPy: http://numpy.scipy.org/