0% found this document useful (0 votes)
23 views13 pages

Pandas

Uploaded by

nandiniasadi01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views13 pages

Pandas

Uploaded by

nandiniasadi01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

What Is Pandas?

• Pandas is a data manipulation package in Python for tabular data. That is,
data in the form of rows and columns, also known as DataFrames.
• pandas functionality includes data transformations, like sorting rows and
taking subsets, to calculating summary statistics such as the mean, reshaping
DataFrames, and joining DataFrames together.
• pandas works well with other popular Python data science packages, often
called the PyData ecosystem, including
NumPy for numerical computing
Matplotlib, Seaborn, Plotly, and other data visualization packages
scikit-learn for machine learning
What is Pandas used for?

Pandas is used throughout the data analysis workflow with pandas, you can:
• Import datasets from databases, spreadsheets, comma-separated values (CSV)
files, and more.
• Clean datasets, for example, by dealing with missing values.
• Tidy datasets by reshaping their structure into a suitable format for analysis.
• Aggregate data by calculating summary statistics such as the mean of
columns, correlation between them, and more.
• Visualize datasets and uncover insights.
• Pandas also contains functionality for time series analysis and analyzing text
data.
Benefits Of The Pandas Package
Undoubtedly, pandas is a powerful data manipulation tool packaged with several benefits, including:
• Made for Python: Python is the world's most popular language for machine learning and data
science.
• Less verbose per unit operations: Code written in pandas is less verbose, requiring fewer lines of
code to get the desired output.
• Intuitive view of data: pandas offers exceptionally intuitive data representation that facilitates
easier data understanding and analysis.
• Extensive feature set: It supports an extensive set of operations from exploratory data analysis,
dealing with missing values, calculating statistics, visualizing univariate and bivariate data, and
much more.
• Works with large data: pandas handles large data sets with ease. It offers speed and efficiency
while working with datasets of the order of millions of records and hundreds of columns,
depending on the machine.
How To Install Pandas?

• Use the pip install command in your terminal.


• pip install pandas
Importing data in pandas
• To begin working with pandas, import the pandas Python package as shown
below. When importing pandas, the most common alias for pandas is pd.
import pandas as pd
Introduction to Pandas Data Structures
• To get started with pandas, you will need to get comfortable with its
two workhorse data structures: Series and DataFrame
Series
• A Series is a one-dimensional array-like object containing an array of data (of any
NumPy data type) and an associated array of data labels, called its index. The
simplest Series is formed from only an array of data
DataFrame

• A DataFrame represents a tabular, spreadsheet-like data structure containing an


ordered collection of columns, each of which can be a different value type
(numeric,string, Boolean, etc.).
• The DataFrame has both a row and column index; it can be thought of as a dict of
Series (one for all sharing the same index).
• Compared with other such DataFrame-like structures you may have used before
(like R’s data.frame), row oriented and column-oriented operations in DataFrame
are treated roughly symmetrically.
• Under the hood, the data is stored as one or more two-dimensional blocks rather
than a list, dict, or some other collection of one-dimensional arrays.
Index
• Pandas’s Index objects are responsible for holding the axis labels and other
metadata
• (like the axis name or names). Any array or other sequence of labels used when
constructing a Series or DataFrame is internally converted to an Index
Reindexing
• A critical method on pandas objects is reindex, which means to create
a new object with the data conformed to a new index.
Indexing, Selection, And Filtering

• Series indexing (obj[...]) works analogously to NumPy array indexing, except you
can use the Serie's index values instead of only integers.
• the special indexing field ix. It enables you to select a subset of the rows and
columns from a DataFrame with NumPylike notation plus axis labels.
Arithmetic and data alignment
• Arithmetic methods with fill values In arithmetic operations between
differently-indexed objects, you might want to fill with a special value, like 0,
when an axis label is found in one object but not the other

You might also like