0% found this document useful (0 votes)

113 views11 pages

Introduction To Pandas & Data Structures

Uploaded by

Abdul Rauf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views11 pages

Introduction To Pandas & Data Structures

Uploaded by

Abdul Rauf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction to Pandas & Data Structures

March 9, 2022

1 Introduction to Pandas
Pandas is an open source library providing high-performance, easy-to-use data structures and data
analysis tools for the Python programming language. Today, pandas is actively supported by a
community of like-minded individuals around the world who contribute their valuable time and
energy to help make open source pandas possible. We will learn to use pandas for data analysis. If
you have never used this library, you can think about pandas as an extremely powerful version of
Excel and with lot more features

1.1 pandas Data Structures

Series and DataFrame are two workhorse data structures in pandas. Lets talk about series first:

1.2 Series
Series is a one-dimensional array-like object, which contains values and an array of labels, associated
with the values. Series can be indexed using labels. (Series is similar to NumPy array – actually,
it is built on top of the NumPy array object) Series can hold any arbitrary Python object. Let’s
get hands-on and learn the concepts of Series with examples:
[1]: # first thing first, we need to import NumPy and pandas
# np and pd are alias for NumPy and pandas

import numpy as np
import pandas as pd

# just to check ther versions we are using

print('numpy version:', np.__version__)
print('pandas version:', pd.__version__)

numpy version: 1.20.3

pandas version: 1.3.4
We can create a Series using list, numpy array, or dictionary Let’s create these objects and convert
them into panda’s Series! Series using lists Lets create a Python lists, one containing labels and
another with data
[2]: my_labels = ['x', 'y', 'z']
my_data = [100, 200, 300]

1
So, we have two Python’s list objects, • my_labels - a list of strings, and • my_data - a list of
numbers We can use [Link] (with capital S) to convert the Python’s list object to pandas Series.

[3]: # Converting my_data (Python list) to Series (pandas series)

[Link](data=my_data)

[3]: 0 100
1 200
2 300
dtype: int64

Column “0 1 2” is automatically generated index for the elements in series with data “100 200 300”.
We can specify index values and grab the respective data/values using these indexes. Let’s pass
my_labels to the Series as index.
[4]: [Link](data=my_data, index=my_labels)

[4]: x 100
y 200
z 300
dtype: int64

1.3 Series using NumPy arrays

[5]: # Let's create NumPy array from my_data and then Series from that array
my_array = [Link](my_data) # creating numpy's array from list
[Link](data=my_array) # creating series from numpy's array

[5]: 0 100
1 200
2 300
dtype: int32

Notice, we got the index column “012” again, let’s pass our own index values!
[6]: [Link](data=my_data, index=my_labels)
# [Link](my_array, my_labels) # data and index are in order

[6]: x 100
y 200
z 300
dtype: int64

2
1.4 Series using dictionary
[7]: # Let's create a dictionary my_dict
my_dict = {'x': 100, 'y': 200, 'z': 300} # creating a dictionary my_dict
[Link](data=my_dict) # creating series from dictionary

[7]: x 100
y 200
z 300
dtype: int64

Notice the difference here, if we pass a dictionary to Series, pandas will take the keys as index/labels
and values as data.

1.5 Grabbing data from Series

Indexes are the key thing to understand in Series. Pandas use these indexes (numbers or names)
for fast information retrieval. (Index works just like a hash table or a dictionary). To understand
the concepts, Let’s create three Series, ser1, ser2, ser3 from dictionaries with some random data
[8]: # Creating three dictionaries dict_1, dict_2, dict_3
dict_1 = {'Toronto': 500, 'Calgary': 200, 'Vancouver': 300, 'Montreal': 700}
dict_2 = {'Calgary': 200, 'Vancouver': 300, 'Montreal': 700}
dict_3 = {'Calgary': 200, 'Vancouver': 300, 'Montreal': 700, 'Jasper': 1000}

[9]: # Creating pandas series from the dictionaries

ser1 = [Link](dict_1)
ser2 = [Link](dict_2)
ser3 = [Link](dict_3)

[10]: print(ser1)

Toronto 500
Calgary 200
Vancouver 300
Montreal 700
dtype: int64

[11]: # Grabbing information for series is very much similar to dictionary. Simply␣
,→pass,!the index and it will return the value!

ser1['Calgary'] # its case sensitive "calgary" is not the same as "Calgary"

[11]: 200

[12]: ser4 = ser1 + ser2 # adding series and assigning/passing results to a new␣
,→variable,!ser4

ser4

3
[12]: Calgary 400.0
Montreal 1400.0
Toronto NaN
Vancouver 600.0
dtype: float64

1.6 Builtin Function

Below are some commonly used built-in functions and attributes for series during the data process-
ing. isnull() * detect missing data

[13]: # [Link](ser4) is same as [Link]()

[Link]()
# shift+tab, its Type is method

[13]: Calgary False

Montreal False
Toronto True
Vancouver False
dtype: bool

[14]: # notnull() * Detect existing (non-missing) values.

#[Link](ser5) is same as [Link]()
[Link]()

[14]: Calgary True

Montreal True
Toronto False
Vancouver True
dtype: bool

head(), tail() To view a small sample of a Series or DataFrame (we will learn DataFrame in the
next lecture) object, use the head() and tail() methods. The default number of elements to display
is five, but you may pass a custom number.
[15]: [Link](1) # head(1) will return the first row only

[15]: Toronto 500

dtype: int64

[16]: [Link](1) # tail(1) will return the last row only

[16]: Montreal 700

dtype: int64

[17]: # axes * Returns list of the row axis labels

# row axis labels (index) list can be obtained
[Link]

4
[17]: [Index(['Toronto', 'Calgary', 'Vancouver', 'Montreal'], dtype='object')]

values * returns list of values/data

[18]: # returns the values/data

[Link]

[18]: array([500, 200, 300, 700], dtype=int64)

size * Returns the number of elements in the series empty * True if the series in empty
[19]: # True for empty series
[Link]

[19]: False

[20]: [Link]

[20]: 4

1.7 DataFrame
A very simple way to think about the DataFrame is, “bunch of Series together such as they share
the same index”. * A DataFrams is a rectangular table of data that contains an ordered collection
of columns, each of which can be a different value type (numeric, string, boolean, etc). DataFrame
has both row & column index; it can be thought of as a dictionary of Series all sharing the same
index (any row or column). Let’s learn DataFrame with examples:

[21]: # Let’s create two labels or indexes: * index: for rows ‘r1 to r10’ * columns:␣
,→for columns ‘c1 to c10’

# Using split() for revision!

import pandas as pd
import numpy as np

index = 'r1 r2 r3 r4 r5 r6 r7 r8 r9 r10'.split()

columns = 'c1 c2 c3 c4 c5 c6 c7 c8 c9 c10'.split()

print(index)
print(columns)

['r1', 'r2', 'r3', 'r4', 'r5', 'r6', 'r7', 'r8', 'r9', 'r10']
['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 'c8', 'c9', 'c10']

[22]: # Let’s start with a simple example, using arange() and reshape() together to␣
,→create a 2D array (matrix).

array_2d = [Link](0, 100).reshape(10, 10) # creating a 2D array "array_2d"

5
print(array_2d)

[[ 0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 23 24 25 26 27 28 29]
[30 31 32 33 34 35 36 37 38 39]
[40 41 42 43 44 45 46 47 48 49]
[50 51 52 53 54 55 56 57 58 59]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 88 89]
[90 91 92 93 94 95 96 97 98 99]]

[23]: # Now, let's create our first DataFrame using index, columns and array_2d!
df = [Link](data=array_2d, index=index, columns=columns)

print(df)

c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
r1 0 1 2 3 4 5 6 7 8 9
r2 10 11 12 13 14 15 16 17 18 19
r3 20 21 22 23 24 25 26 27 28 29
r4 30 31 32 33 34 35 36 37 38 39
r5 40 41 42 43 44 45 46 47 48 49
r6 50 51 52 53 54 55 56 57 58 59
r7 60 61 62 63 64 65 66 67 68 69
r8 70 71 72 73 74 75 76 77 78 79
r9 80 81 82 83 84 85 86 87 88 89
r10 90 91 92 93 94 95 96 97 98 99
df is our first dataframe. We have columns, c1 to c10, and their corresponding rows, r1 to r10.
Each column is actually a pandas series, sharing a common index, which is the row labels. Now,
we can play with this dataframe df to learn how to Grab data that we need, which is the most
important concept we want to learn to move one in this course!

Grabbing Columns from dataframe Just pass the name of the required column in square
brackets!
[24]: # Grabbing a single column
df['c1']

[24]: r1 0
r2 10
r3 20
r4 30
r5 40
r6 50

6
r7 60
r8 70
r9 80
r10 90
Name: c1, dtype: int32

[25]: # We can grab more than one column, simply pass the list of columns you need!
df[['c1', 'c10']]

[25]: c1 c10
r1 0 9
r2 10 19
r3 20 29
r4 30 39
r5 40 49
r6 50 59
r7 60 69
r8 70 79
r9 80 89
r10 90 99

1.8 Adding new column to dataframe

pandas dataframes are very handy, Let’s add a column ’new into our dataframe df by adding any
two existing columns using simple “+” operator!
[26]: df['new'] = df['c1'] + df['c2'] # adding a column "new" which is sum of "c1"␣
,→and "c2"

print(df)

c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 new
r1 0 1 2 3 4 5 6 7 8 9 1
r2 10 11 12 13 14 15 16 17 18 19 21
r3 20 21 22 23 24 25 26 27 28 29 41
r4 30 31 32 33 34 35 36 37 38 39 61
r5 40 41 42 43 44 45 46 47 48 49 81
r6 50 51 52 53 54 55 56 57 58 59 101
r7 60 61 62 63 64 65 66 67 68 69 121
r8 70 71 72 73 74 75 76 77 78 79 141
r9 80 81 82 83 84 85 86 87 88 89 161
r10 90 91 92 93 94 95 96 97 98 99 181

1.9 Deleting column from dataframe

drop() We can delete any column form a dataframe using drop() method. Few important parameters
that we need to consider: * label: column name that we need to pass, if we need to drop more
than one columns, it must be a list of column names. * axis: default value is 0 which refers to row,

7
to drop a column, we need to pass axis = 1 * inplace: default is False, we need to pass True for
permanent delete. Inplace make sure that we don’t delete column by mistake. If we don’t pass this
parameter, the column will not be dropped from the dataframe.
[27]: # So, we have 10 rows and 11 columns in our dataframe df, “new” is the 11th one␣
,→that we have added.

# Let’s delete this column.

[Link](['new'], axis=1, inplace=True) # If we don't pass inplce =␣

,→True,the,change will not be permanent

print(df)

1.10 Grabbing Rows from dataframe

We can retrieve a row by its name or position with loc and iloc. * loc: Access a rows by label(s).
* iloc: Using row’s index location.
[28]: # using loc, this will return rows r2 and r3, notice the list [r2, r3] in,␣
,→square brackets

[Link][['r2', 'r3']]

[28]: c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
r2 10 11 12 13 14 15 16 17 18 19
r3 20 21 22 23 24 25 26 27 28 29

[29]: # Uisng iloc, this will again return rows r2 and r3, but here our selection in,␣
,→index based!

[Link][[1, 2]] # remember, index starts with 0

[29]: c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
r2 10 11 12 13 14 15 16 17 18 19
r3 20 21 22 23 24 25 26 27 28 29

8
1.11 Grabbing a single element form a dataframe
[30]: # We need to tell the location of the element, [row, col]
# [Link](req_row, req_col) -- pass row, col for the element!
[Link]['r2', 'c1']

[30]: 10

[31]: # another element, say 10 which is at [r2,c10]

[Link]['r2', 'c10']

[31]: 19

Grabbing sub-set of a dataframe We can grab a sub-set by passing list of required rows and list of
required columns
[32]: # for a sub-set, pass the list
[Link][['r1', 'r2'], ['c1', 'c2']]

[32]: c1 c2
r1 0 1
r2 10 11

[33]: # another example - random columns and rows in the list

[Link][['r2', 'r5'], ['c3', 'c4']]

[33]: c3 c4
r2 12 13
r5 42 43

1.12 Conditional Selection or masking

pandas got excellent features, we can do a conditional selection. For example, all the values that
are greater than some value, e.g. greater that 5 in the case below!
[34]: # We can do a conditional selection as well
df > 5
# df!=0 # try this yourself
# df=0 # try this yourself

[34]: c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
r1 False False False False False False True True True True
r2 True True True True True True True True True True
r3 True True True True True True True True True True
r4 True True True True True True True True True True
r5 True True True True True True True True True True
r6 True True True True True True True True True True
r7 True True True True True True True True True True

9
r8 True True True True True True True True True True
r9 True True True True True True True True True True
r10 True True True True True True True True True True

[35]: # Return Divisible by 2 or even

bool_mask = df % 2 == 0 # creating mask for the required condition
df[bool_mask] # passing mask to get the required results

# df[df % 2 == 0] # Similar to the above 2 lines of code

[35]: c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
r1 0 NaN 2 NaN 4 NaN 6 NaN 8 NaN
r2 10 NaN 12 NaN 14 NaN 16 NaN 18 NaN
r3 20 NaN 22 NaN 24 NaN 26 NaN 28 NaN
r4 30 NaN 32 NaN 34 NaN 36 NaN 38 NaN
r5 40 NaN 42 NaN 44 NaN 46 NaN 48 NaN
r6 50 NaN 52 NaN 54 NaN 56 NaN 58 NaN
r7 60 NaN 62 NaN 64 NaN 66 NaN 68 NaN
r8 70 NaN 72 NaN 74 NaN 76 NaN 78 NaN
r9 80 NaN 82 NaN 84 NaN 86 NaN 88 NaN
r10 90 NaN 92 NaN 94 NaN 96 NaN 98 NaN

1.12.1 info()
Provides a concise summary of the DataFrame. This is a very useful method.
[36]: [Link]()

<class '[Link]'>
Index: 10 entries, r1 to r10
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 c1 10 non-null int32
1 c2 10 non-null int32
2 c3 10 non-null int32
3 c4 10 non-null int32
4 c5 10 non-null int32
5 c6 10 non-null int32
6 c7 10 non-null int32
7 c8 10 non-null int32
8 c9 10 non-null int32
9 c10 10 non-null int32
dtypes: int32(10)
memory usage: 780.0+ bytes

10
1.12.2 describe()
Generates descriptive statistics that summarize the central tendency, dispersion and shape of a
dataset’s distribution, excluding NaN values.
[37]: [Link]()

[37]: c1 c2 c3 c4 c5 c6 \
count 10.000000 10.000000 10.000000 10.000000 10.000000 10.000000
mean 45.000000 46.000000 47.000000 48.000000 49.000000 50.000000
std 30.276504 30.276504 30.276504 30.276504 30.276504 30.276504
min 0.000000 1.000000 2.000000 3.000000 4.000000 5.000000
25% 22.500000 23.500000 24.500000 25.500000 26.500000 27.500000
50% 45.000000 46.000000 47.000000 48.000000 49.000000 50.000000
75% 67.500000 68.500000 69.500000 70.500000 71.500000 72.500000
max 90.000000 91.000000 92.000000 93.000000 94.000000 95.000000

c7 c8 c9 c10
count 10.000000 10.000000 10.000000 10.000000
mean 51.000000 52.000000 53.000000 54.000000
std 30.276504 30.276504 30.276504 30.276504
min 6.000000 7.000000 8.000000 9.000000
25% 28.500000 29.500000 30.500000 31.500000
50% 51.000000 52.000000 53.000000 54.000000
75% 73.500000 74.500000 75.500000 76.500000
max 96.000000 97.000000 98.000000 99.000000

Pandas
No ratings yet
Pandas
57 pages
Subject IP
No ratings yet
Subject IP
9 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
100% (1)
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
No ratings yet
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
25 pages
Pandas Notes
No ratings yet
Pandas Notes
19 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
9 pages
Ip Notes
No ratings yet
Ip Notes
20 pages
Pandas Basics: Series and DataFrame Operations
No ratings yet
Pandas Basics: Series and DataFrame Operations
15 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
75 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
23 pages
1 IP 12 NOTES PythonPandas 2022 PDF
100% (3)
1 IP 12 NOTES PythonPandas 2022 PDF
66 pages
Unit 2
No ratings yet
Unit 2
81 pages
Pandas Series - Notes For PA3
No ratings yet
Pandas Series - Notes For PA3
9 pages
Panda
No ratings yet
Panda
46 pages
12ip 22 23
No ratings yet
12ip 22 23
188 pages
Introduction to Python Pandas Library
No ratings yet
Introduction to Python Pandas Library
33 pages
Data Handling Using Pandas-1
100% (1)
Data Handling Using Pandas-1
25 pages
Grade-XII-IP - Ch-1 - Series Notes
No ratings yet
Grade-XII-IP - Ch-1 - Series Notes
28 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
Pandas 1 Series
No ratings yet
Pandas 1 Series
14 pages
Pandas
No ratings yet
Pandas
49 pages
Exp 25 - 26
No ratings yet
Exp 25 - 26
17 pages
Data Handling with Pandas in Python
No ratings yet
Data Handling with Pandas in Python
27 pages
Unit 1 - Data Handling Using Pandas and Data Visualization (Series)
No ratings yet
Unit 1 - Data Handling Using Pandas and Data Visualization (Series)
11 pages
LAST MINUTES REVISION Pandas Series
No ratings yet
LAST MINUTES REVISION Pandas Series
6 pages
Python Basics: Data Types & Control Flow
No ratings yet
Python Basics: Data Types & Control Flow
55 pages
Python Pandas for Data Analysts
No ratings yet
Python Pandas for Data Analysts
12 pages
Panda Ncert 1
No ratings yet
Panda Ncert 1
36 pages
Ncert Pandas
No ratings yet
Ncert Pandas
36 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
Pandas
No ratings yet
Pandas
12 pages
ML Lab8
No ratings yet
ML Lab8
28 pages
14 Pandas
No ratings yet
14 Pandas
25 pages
Pandas
100% (1)
Pandas
163 pages
Pandas in Py: A Detailed Overview Into Series and Dataframe Functions in Pandas
No ratings yet
Pandas in Py: A Detailed Overview Into Series and Dataframe Functions in Pandas
21 pages
Python UnitIV
No ratings yet
Python UnitIV
20 pages
Final Formatted After Iloc Loc
No ratings yet
Final Formatted After Iloc Loc
34 pages
Pandas Guide for Data Analysts
No ratings yet
Pandas Guide for Data Analysts
33 pages
SR Ip Pandas I Full Notes
No ratings yet
SR Ip Pandas I Full Notes
30 pages
Unit III Part 2 1725700061785
No ratings yet
Unit III Part 2 1725700061785
85 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Python Data Processing
No ratings yet
Python Data Processing
36 pages
Python Pandas
No ratings yet
Python Pandas
21 pages
Data Handlinng Using Pandas-I
No ratings yet
Data Handlinng Using Pandas-I
46 pages
1 Data Handlinng Using Pandas-I
No ratings yet
1 Data Handlinng Using Pandas-I
46 pages
Httpsncert Nic Intextbookpdfleip102 PDF
No ratings yet
Httpsncert Nic Intextbookpdfleip102 PDF
36 pages
Introduction to Python Libraries
No ratings yet
Introduction to Python Libraries
36 pages
Leip 102
No ratings yet
Leip 102
36 pages
Python Pandas
100% (1)
Python Pandas
96 pages
Introduction to Pandas Basics
No ratings yet
Introduction to Pandas Basics
6 pages
XII IP QuickRevision
No ratings yet
XII IP QuickRevision
26 pages
Unit - 1 - Python Pandas
No ratings yet
Unit - 1 - Python Pandas
176 pages
XII-IP-QuickRevision 2 in 1
No ratings yet
XII-IP-QuickRevision 2 in 1
13 pages
Module 6
No ratings yet
Module 6
48 pages
Class 12 IP Ch-1, 2 3
No ratings yet
Class 12 IP Ch-1, 2 3
28 pages
Ip-Data Handling With Pandas
No ratings yet
Ip-Data Handling With Pandas
13 pages
Pandas
No ratings yet
Pandas
63 pages
Joystick-Controlled Stepper Motor Guide
No ratings yet
Joystick-Controlled Stepper Motor Guide
3 pages
ZF Land Rover Buletins PDF
100% (1)
ZF Land Rover Buletins PDF
28 pages
DQ 10
No ratings yet
DQ 10
4 pages
Rajarathinam 2023 BP 7762A
No ratings yet
Rajarathinam 2023 BP 7762A
15 pages
Radioactivity Chapter16 Part1
No ratings yet
Radioactivity Chapter16 Part1
10 pages
Physics Practicals For Class X
52% (27)
Physics Practicals For Class X
7 pages
Beam-Column Rigid Connection Analysis
No ratings yet
Beam-Column Rigid Connection Analysis
2 pages
DLL Matatag - Science 7 Q1 W2
No ratings yet
DLL Matatag - Science 7 Q1 W2
14 pages
FireAssay Improvemnts
No ratings yet
FireAssay Improvemnts
2 pages
Cns Material
No ratings yet
Cns Material
26 pages
Math 4 Summative 01
No ratings yet
Math 4 Summative 01
3 pages
Pneumatic Can Crushing Machine Project
No ratings yet
Pneumatic Can Crushing Machine Project
55 pages
Acme Marketing Decisions 1 Finished
No ratings yet
Acme Marketing Decisions 1 Finished
8 pages
027 2063 AS6 Instruction Manual Saftronics PDF
No ratings yet
027 2063 AS6 Instruction Manual Saftronics PDF
81 pages
Galena Radio Construction
No ratings yet
Galena Radio Construction
2 pages
MP l2 Vpns Xe 3s Asr920 Book
No ratings yet
MP l2 Vpns Xe 3s Asr920 Book
204 pages
Dual-Band Notched UWB Discone Antenna
No ratings yet
Dual-Band Notched UWB Discone Antenna
13 pages
How To Check Thermal Relay
No ratings yet
How To Check Thermal Relay
1 page
Hadamard 1911 English Translation
No ratings yet
Hadamard 1911 English Translation
2 pages
Physics Ned 2024
No ratings yet
Physics Ned 2024
6 pages
Overview of Dynamic Systems Development
100% (5)
Overview of Dynamic Systems Development
15 pages
EHB en File 6.4.4-Back-Pressure
No ratings yet
EHB en File 6.4.4-Back-Pressure
3 pages
Math 9 - Q1 - Mod10
0% (1)
Math 9 - Q1 - Mod10
13 pages
Rapidly Transforming PTFE From Hydrophobic To Hydrophilic Through Plasma Treatment
No ratings yet
Rapidly Transforming PTFE From Hydrophobic To Hydrophilic Through Plasma Treatment
16 pages
FSK Modulation for Engineers
No ratings yet
FSK Modulation for Engineers
4 pages
Math4 Q3 W4 DLL
No ratings yet
Math4 Q3 W4 DLL
7 pages
Overview of I2P Anonymous Network
No ratings yet
Overview of I2P Anonymous Network
11 pages
MANOPBYUM01 Rev 3.2 Part1
100% (1)
MANOPBYUM01 Rev 3.2 Part1
13 pages
Calculation
100% (3)
Calculation
33 pages
CS Core Number Analysis
100% (1)
CS Core Number Analysis
14 pages

Introduction To Pandas & Data Structures

Uploaded by

Introduction To Pandas & Data Structures

Uploaded by

Introduction to Pandas & Data Structures

1.1 pandas Data Structures

# just to check ther versions we are using

numpy version: 1.20.3

[3]: # Converting my_data (Python list) to Series (pandas series)

1.3 Series using NumPy arrays

1.5 Grabbing data from Series

[9]: # Creating pandas series from the dictionaries

ser1['Calgary'] # its case sensitive "calgary" is not the same as "Calgary"

1.6 Builtin Function

[13]: # [Link](ser4) is same as [Link]()

[13]: Calgary False

[14]: # notnull() * Detect existing (non-missing) values.

[14]: Calgary True

[15]: Toronto 500

[16]: [Link](1) # tail(1) will return the last row only

[16]: Montreal 700

[17]: # axes * Returns list of the row axis labels

values * returns list of values/data

[18]: # returns the values/data

[18]: array([500, 200, 300, 700], dtype=int64)

# Using split() for revision!

index = 'r1 r2 r3 r4 r5 r6 r7 r8 r9 r10'.split()

array_2d = [Link](0, 100).reshape(10, 10) # creating a 2D array "array_2d"

1.8 Adding new column to dataframe

1.9 Deleting column from dataframe

# Let’s delete this column.

[Link](['new'], axis=1, inplace=True) # If we don't pass inplce =␣

1.10 Grabbing Rows from dataframe

[Link][[1, 2]] # remember, index starts with 0

[31]: # another element, say 10 which is at [r2,c10]

[33]: # another example - random columns and rows in the list

1.12 Conditional Selection or masking

[35]: # Return Divisible by 2 or even

# df[df % 2 == 0] # Similar to the above 2 lines of code

You might also like