0% found this document useful (0 votes)
81 views4 pages

Pandas - Jupyter Notebook

Pandas is a Python library built on NumPy for data manipulation and analysis. It contains two main data structures - Series for 1D data and DataFrame for 2D tabular data. DataFrame is the most widely used data structure in Pandas for data analysis and manipulation. It allows storing and manipulating data efficiently by providing functions for aggregation like sum, mean, count etc.

Uploaded by

Maximus Aranha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views4 pages

Pandas - Jupyter Notebook

Pandas is a Python library built on NumPy for data manipulation and analysis. It contains two main data structures - Series for 1D data and DataFrame for 2D tabular data. DataFrame is the most widely used data structure in Pandas for data analysis and manipulation. It allows storing and manipulating data efficiently by providing functions for aggregation like sum, mean, count etc.

Uploaded by

Maximus Aranha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Python Libraries - Pandas - Pandas Basics

Pandas is a library built using NumPy specifically for data analysis.you will be using Pandas heavily
for data manipulation,visuilization,building machine learning models,etc.

There are two main data structures in pandas:

• series

• dataframes

The default way to store data in dataframes,and thus manipilating dataframes quickly in probable the most important skill set for datya analysis.

In [1]:

1 pip install pandas

Requirement already satisfied: pandas in c:\users\student\anaconda3\lib\site-packages (1.4.4)


Requirement already satisfied: pytz>=2020.1 in c:\users\student\anaconda3\lib\site-packages (from pandas) (2022.1)
Requirement already satisfied: numpy>=1.18.5 in c:\users\student\anaconda3\lib\site-packages (from pandas) (1.21.5)
Requirement already satisfied: python-dateutil>=2.8.1 in c:\users\student\anaconda3\lib\site-packages (from pandas) (2.8.2)
Requirement already satisfied: six>=1.5 in c:\users\student\anaconda3\lib\site-packages (from python-dateutil>=2.8.1->panda
s) (1.16.0)
Note: you may need to restart the kernel to use updated packages.

In [3]:

1 import pandas as pd

In [4]:

1 # The Pandas series


2 #creating a numeric pandas series
3 s = pd.Series([2,4,5,6,9])
4 print(s)
5 print(type(s))
0 2
1 4
2 5
3 6
4 9
dtype: int64
<class 'pandas.core.series.Series'>

In [5]:

1 #creating a series of type datetime


2 data_series = pd.date_range(start = '11-09-2017', end= '12-12-2017')
3 data_series
4 #type (data_series)
Out[5]:

DatetimeIndex(['2017-11-09', '2017-11-10', '2017-11-11', '2017-11-12',


'2017-11-13', '2017-11-14', '2017-11-15', '2017-11-16',
'2017-11-17', '2017-11-18', '2017-11-19', '2017-11-20',
'2017-11-21', '2017-11-22', '2017-11-23', '2017-11-24',
'2017-11-25', '2017-11-26', '2017-11-27', '2017-11-28',
'2017-11-29', '2017-11-30', '2017-12-01', '2017-12-02',
'2017-12-03', '2017-12-04', '2017-12-05', '2017-12-06',
'2017-12-07', '2017-12-08', '2017-12-09', '2017-12-10',
'2017-12-11', '2017-12-12'],
dtype='datetime64[ns]', freq='D')

The Dataframe
Dataframe is the most widely used data-structure in data analysis.It is a table with rows andcolumns,with rows having index and columns having meaningful
data.

creating dataframes from dictionaries.

EXAMPLE - 1
In [8]:

1 country = ['United States','Australia','India','Russia','Morrocco']


2 symbol = ['US','AU','IND','RUS','MOR']
3 dic_world = {"country":country,"symbol":symbol}

In [9]:

1 print(dic_world)
{'country': ['United States', 'Australia', 'India', 'Russia', 'Morrocco'], 'symbol': ['US', 'AU', 'IND', 'RUS', 'MOR']}

In [10]:

1 dic_world["country"]
2
Out[10]:

['United States', 'Australia', 'India', 'Russia', 'Morrocco']

In [11]:

1 dic_world["symbol"]
Out[11]:

['US', 'AU', 'IND', 'RUS', 'MOR']

In [12]:

1 data = pd.DataFrame(dic_world)

In [13]:

1 print(type(data))
2

<class 'pandas.core.frame.DataFrame'>

In [14]:

1 print(data)
2

country symbol
0 United States US
1 Australia AU
2 India IND
3 Russia RUS
4 Morrocco MOR

In [15]:

1 print(data["country"])

0 United States
1 Australia
2 India
3 Russia
4 Morrocco
Name: country, dtype: object

In [16]:

1 print(data["symbol"])
2
0 US
1 AU
2 IND
3 RUS
4 MOR
Name: symbol, dtype: object

EXAMPLE-2
In [18]:

1 #defining data to create lists for dictionary


2 cars_per_cap = [809,731,588,18,200,70,45]
3 country = ['United states','Australia','Japan','India','Russia','Morroco','Egypt']
4 drives_right = [False,True,True,True,False,False,False]
5
In [19]:

1 #creating the dictionaries to state the entries as key:value pair.


2 cars_dict = {"cars_per_cap":cars_per_cap,"country":country,"drives_right":drives_right}

In [20]:

1 print(cars_dict)

{'cars_per_cap': [809, 731, 588, 18, 200, 70, 45], 'country': ['United states', 'Australia', 'Japan', 'India', 'Russia', 'M
orroco', 'Egypt'], 'drives_right': [False, True, True, True, False, False, False]}

In [21]:

1 print(cars_dict['cars_per_cap'])
[809, 731, 588, 18, 200, 70, 45]

In [22]:

1 cars = pd.DataFrame(cars_dict)

AGGREGATION FUNCTION
In [24]:

1 cars
Out[24]:

cars_per_cap country drives_right

0 809 United states False

1 731 Australia True

2 588 Japan True

3 18 India True

4 200 Russia False

5 70 Morroco False

6 45 Egypt False

In [25]:

1 cars.cars_per_cap

Out[25]:

0 809
1 731
2 588
3 18
4 200
5 70
6 45
Name: cars_per_cap, dtype: int64

In [26]:

1 print(cars.cars_per_cap.max())
809

In [27]:

1 print(cars.cars_per_cap.min())

18

In [28]:

1 print(cars.cars_per_cap.mean())
351.57142857142856

In [29]:

1 print(cars.cars_per_cap.std())
345.59555222005633

In [30]:

1 print(cars.cars_per_cap.count())

7
In [39]:

1 country = ['United states','Australia','Japan','India','Russia','Morroco','Egypt']


2 cars_per_cap = [809,731,588,18,200,70,45]

In [41]:

1 lst = [['tom','reacher',25],['krish','pete',30],['nick','wilson',26],['julie', 'jonny', 28]]


2 df = pd.DataFrame(lst,columns = ['FName','LName','Age'],dtype = float)
3 df

C:\Users\student\AppData\Local\Temp\ipykernel_9292\3002031254.py:2: FutureWarning: Could not cast to float64, falling back


to object. This behavior is deprecated. In a future version, when a dtype is passed to 'DataFrame', either all columns will
be cast to that dtype, or a TypeError will be raised.
df = pd.DataFrame(lst,columns = ['FName','LName','Age'],dtype = float)

Out[41]:

FName LName Age

0 tom reacher 25.0

1 krish pete 30.0

2 nick wilson 26.0

3 julie jonny 28.0

In [42]:

1 df.Age.max()
Out[42]:

30.0

In [43]:

1 df.Age.min()
Out[43]:

25.0

In [44]:

1 df.Age.mean()
Out[44]:

27.25

In [45]:

1 df.Age.std()

Out[45]:

2.217355782608345

In [46]:

1 df.Age.count()
Out[46]:

In [ ]:

You might also like