The document provides an overview of key Python libraries for data analysis, categorizing them into scientific computing libraries and visualization tools. It highlights Pandas for data manipulation, NumPy for array processing, and Matplotlib and Seaborn for data visualization. Additionally, it mentions Scikit-learn and Statsmodels for machine learning and statistical modeling.
Download as TXT, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
2 views
03 Python Packages for Data Science.en
The document provides an overview of key Python libraries for data analysis, categorizing them into scientific computing libraries and visualization tools. It highlights Pandas for data manipulation, NumPy for array processing, and Matplotlib and Seaborn for data visualization. Additionally, it mentions Scikit-learn and Statsmodels for machine learning and statistical modeling.
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 1
In order to do data analysis in Python, we should first tell you a little bit about
the main packages relevant to analysis in Python. A Python library is
a collection of functions and methods that allow you to perform lots of actions without writing any code. The libraries usually contain built in modules providing different functionalities which you can use directly. And there are extensive libraries offering a broad range of facilities. We have divided the Python data analysis libraries into three groups. The first group is called scientific computing libraries. Pandas offers data structure and tools for effective data manipulation and analysis. It provides facts, access to structured data. The primary instrument of Pandas is the two dimensional table consisting of column and row labels, which are called a data frame. It is designed to provid easy indexing functionality. The NumPy library uses arrays for its inputs and outputs. It can be extended to objects for matrices and with minor coding changes, developers can perform fast array processing. SciPy includes functions for some advanced math problems as listed on this slide, as well as data visualization. Using data visualization methods is the best way to communicate with others, showing them meaningful results of analysis. These libraries enable you to create graphs, charts and maps. The Matplotlib package is the most well known library for data visualization. It is great for making graphs and plots. The graphs are also highly customizable. Another high level visualization library is Seaborn. It is based on Matplotlib. It's very easy to generate various plots such as heat maps, time series and violin plots. With machine learning algorithms, we're able to develop a model using our data set and obtain predictions. The algorithmic libraries tackles the machine learning tasks from basic to complex. Here we introduce two packages, the Scikit-learn library contains tools statistical modeling, including regression, classification, clustering, and so on. This library is built on NumPy, SciPy and Matplotib. Statsmodels is also a Python module that allows users to explore data, estimate statistical models and perform statistical tests. [music]