1 Pandas Basics
1 Pandas Basics
Pandas
Pandas is a library built using NumPy specifically for data analysis. You'll be using Pandas heavily
for data manipulation, visualisation, building machine learning models, etc.
There are two main data structures in Pandas - Series and Dataframes. The default way to store
data is dataframes, and thus manipulating dataframes quickly is probably the most important skill
set for data analysis.
source: https://pandas.pydata.org/pandas-docs/stable/overview.html
(https://pandas.pydata.org/pandas-docs/stable/overview.html)
Series are one-dimensional array-like structures, though unlike numpy arrays, they often contain
non-numeric data (characters, dates, time, booleans etc.)
You can create pandas series from array-like objects using pd.Series() .
http://localhost:8888/notebooks/Desktop/Biplab/My%20Python/Pandas/Introduction_to_Pandas/1_Pandas_Basics.ipynb 1/13
1/4/2019 1_Pandas_Basics
0 2
1 4
2 5
3 6
4 9
dtype: int64
<class 'pandas.core.series.Series'>
Note that each element in the Series has an index, and the index starts at 0 as usual.
Out[2]: 0 a
1 b
2 af
dtype: object
Out[3]: pandas.core.indexes.datetimes.DatetimeIndex
Indexing Series
In [4]: # Indexing pandas series: Same as indexing 1-d numpy arrays or lists
# accessing the fourth element
s[3]
Out[4]: 2 5
3 6
4 9
dtype: int64
http://localhost:8888/notebooks/Desktop/Biplab/My%20Python/Pandas/Introduction_to_Pandas/1_Pandas_Basics.ipynb 2/13
1/4/2019 1_Pandas_Basics
Out[5]: 1 4
3 6
dtype: int64
Usually, you will work with Series only as a part of dataframes. Let's study the basics of dataframes.
There are various ways of creating dataframes, such as creating them from dictionaries, JSON
objects, reading from txt, CSV files, etc.
0 22 Vinay engineer
1 25 Kushal doctor
3 28 Saif teacher
For the upcoming exercises, we will use a dataset of a retail store having details about the orders
placed, customers, product details, sales, profits etc.
http://localhost:8888/notebooks/Desktop/Biplab/My%20Python/Pandas/Introduction_to_Pandas/1_Pandas_Basics.ipynb 3/13
1/4/2019 1_Pandas_Basics
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-1-ae28fe9901d0> in <module>()
1 # reading a CSV file as a dataframe
----> 2 market_df = pd.read_csv("Downloads/FoodRatings_all_same_5.csv.csv")
Usually, dataframes are imported as CSV files, but sometimes it is more convenient to convert
dictionaries into dataframes. For e.g. when the raw data is in a JSON format (which is not
uncommon), you can easily convert it into a dictionary, and then into a dataframe.
After you import a dataframe, you'd want to quickly understand its structure, shape, meanings of
rows and columns etc. Further, you may want to look at summary statistics - such as mean,
percentiles etc.
Out[8]: Ord_id Prod_id Ship_id Cust_id Sales Discount Order_Quantity Profit Shipping_
http://localhost:8888/notebooks/Desktop/Biplab/My%20Python/Pandas/Introduction_to_Pandas/1_Pandas_Basics.ipynb 4/13
1/4/2019 1_Pandas_Basics
In [9]: market_df.tail()
Out[9]: Ord_id Prod_id Ship_id Cust_id Sales Discount Order_Quantity Profit Ship
Here, each row represents an order placed at a retail store. Notice the index associated with each
row - starts at 0 and ends at 8398, implying that there were 8399 orders placed.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8399 entries, 0 to 8398
Data columns (total 10 columns):
Ord_id 8399 non-null object
Prod_id 8399 non-null object
Ship_id 8399 non-null object
Cust_id 8399 non-null object
Sales 8399 non-null float64
Discount 8399 non-null float64
Order_Quantity 8399 non-null int64
Profit 8399 non-null float64
Shipping_Cost 8399 non-null float64
Product_Base_Margin 8336 non-null float64
dtypes: float64(5), int64(1), object(4)
memory usage: 656.2+ KB
http://localhost:8888/notebooks/Desktop/Biplab/My%20Python/Pandas/Introduction_to_Pandas/1_Pandas_Basics.ipynb 5/13
1/4/2019 1_Pandas_Basics
In [11]: # Describe gives you a summary of all the numeric columns in the dataset
market_df.describe()
In [14]: # You can extract the values of a dataframe as a numpy array using df.values
market_df.values
Indices
An important concept in pandas dataframes is that of row indices. By default, each row is assigned
indices starting from 0, and are represented at the left side of the dataframe.
http://localhost:8888/notebooks/Desktop/Biplab/My%20Python/Pandas/Introduction_to_Pandas/1_Pandas_Basics.ipynb 6/13
1/4/2019 1_Pandas_Basics
In [15]: market_df.head()
Out[15]: Ord_id Prod_id Ship_id Cust_id Sales Discount Order_Quantity Profit Shipping_
Now, arbitrary numeric indices are difficult to read and work with. Thus, you may want to change the
indices of the df to something more meanigful.
Let's change the index to Ord_id (unique id of each order), so that you can select rows using the
order ids directly.
Ord_id
Having meaningful row labels as indices helps you to select (subset) dataframes easily. You will
study selecting dataframes in the next section.
Sorting dataframes
You can sort dataframes in two ways - 1) by the indices and 2) by the values.
http://localhost:8888/notebooks/Desktop/Biplab/My%20Python/Pandas/Introduction_to_Pandas/1_Pandas_Basics.ipynb 7/13
1/4/2019 1_Pandas_Basics
Ord_id
http://localhost:8888/notebooks/Desktop/Biplab/My%20Python/Pandas/Introduction_to_Pandas/1_Pandas_Basics.ipynb 8/13
1/4/2019 1_Pandas_Basics
Ord_id
http://localhost:8888/notebooks/Desktop/Biplab/My%20Python/Pandas/Introduction_to_Pandas/1_Pandas_Basics.ipynb 9/13
1/4/2019 1_Pandas_Basics
Ord_id
Ord_id
http://localhost:8888/notebooks/Desktop/Biplab/My%20Python/Pandas/Introduction_to_Pandas/1_Pandas_Basics.ipynb 10/13
1/4/2019 1_Pandas_Basics
Ord_id
http://localhost:8888/notebooks/Desktop/Biplab/My%20Python/Pandas/Introduction_to_Pandas/1_Pandas_Basics.ipynb 11/13
1/4/2019 1_Pandas_Basics
Ord_id
http://localhost:8888/notebooks/Desktop/Biplab/My%20Python/Pandas/Introduction_to_Pandas/1_Pandas_Basics.ipynb 12/13
1/4/2019 1_Pandas_Basics
http://localhost:8888/notebooks/Desktop/Biplab/My%20Python/Pandas/Introduction_to_Pandas/1_Pandas_Basics.ipynb 13/13