We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 18
318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners
Article Search for Projects 6
Python Pandas Dataframe Tutorial for Beginners
What is a pandas dataframe ?
Pandas is a software programming library in Python used for data analysis. Pandas provides data structures
land tools for understanding and analysing data.
‘The simplest way to understand a dataframe is to think of it as a MS Excel inside python. Just like how MS
‘excel is used to store data, has rows/columns and you can perform operations on the data, similarly you can
do all those with a dataframe.
‘There are many ways to deal with data in python including serio
structure of choice used by data scientists. Dataframes can deal with large amounts of data and support
powerful functions to manipulate the data.
Creating dataframes from csv / dictionary / list, adding rows, columns,using dataframe indexes and working
with missing data are all part of the EDA (exoloratory data analysis) stage of a data science project.
Adatatrame is represented in python code as ‘df. All dataframe operations are preceded by ‘df. [operation
lists and dictionaries, but dataframe is the
ntps:www projector joartcelpytnon-pandas-dataframe-ttorials/405 ane318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners
IDV LO INO y
1d: 1e) | aby
a What is MultivariateOLS model in the StatsModels library?
Downloadable solution code | Explanatory videos | Tech Support
Where can dataframes be created from ?
Dataframes can be created from the following data sources - dictionaries, lists, arrays, series, csv files, Mysql
connection to a database ete.
‘What is a pandas series vs dataframe ?
series is a 1-dimensional representation of data and hence has only column while a dataframe is a 2-
dimensional table
Numpy versus Pandas
Numpy is another popular library used for data manipulation but itis largely used for numerical data,
Dataframes however provide powerful functions to work across tables containing multiple data types.
Table of Contents
+ Python Pandas Dataframe Basics
«1. How to create a Dataframe
«2, How to sort rows within a pandas dataframe
+ 3. How to find the largest value in a pandas dataframe
+ 4, How to list unique value in a pandas dataframe
+ 5, How to delete duplicates from a pandas dataframe
* 6 Rename column header in a pandas dataframe
+ 7. Search pandas dataframe for a value
* 8. Drop row and column in a pandas dataframe
* 9. Replace multiple values in a pandas dataframe
* 10. Save pandas dataframe as a csv file
+ 12, How to filter in a pandas dataframe
3 * 13, How te calculate moving average in a pandas dataframe
#14, How to normalise a column in a pandas dataframe
+ 15, How to assign new columns in a pandas dataframe
ntps:www projector jofartcelpython-pandas-datframe-tatorals!405 28318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners
+ 16, How to rank a pandas dataframe in ascending and descending order
Maxi e Your Productivity and ROI with ProjectPro
eee] Peery
ical cece (sce
| hegl eee ee sereecraes Eee
FA game ece cad
asd
Deploy Projects tonterprise rade [mg] Unimited a Sesions with Top
: 4 pentane ean
Book Free Demo @ProjectPro
Python Pandas Dataframe Basics
PANDAS DATAFRAME TUTORIAL
DataFrare
Pandas
1. How to create a Dataframe
Every dataframe usage will have the following line at the beginning of your code:
import pandas as pd
(Once you have identified where your data is coming from and have stored it in an object for example “data’.
You can create your dataframe with the following command, This will convert all the data stored in “data’
‘object into a 2-dimensional dataframe representation and create a dataframe.
df= pd DataFrame(data)
hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, 38318/24, 1:20PM
Pytnon Pandas Dataframe Tutorial for Beginners
Example Tutorial
Check out the first few lines of this pandas dataframe example to see how a dataframe is created.
Here are some of the ways to create a DataFrame in Python Pandas’
New Projects
Build a Streaming Pipeline Langchain Project for
with DBT, Snowflake and. Customer Support App in.
st of lists
Creating a DataFrame from a
4 First import the pandss Lorary
mort pandas as pa
4 create a List of Lists
List_of_2
= [{9anvary’, 24], [“February’, 28], [‘Narch’, 21)]
4 creating the Pandas DatoFrame
4F » pé.vataFrame(1ist_of lists, coluens = [‘Wonth’, ‘Days"})
1 to display the Oatafrane.
“
‘The above code snippet generates the following DataFrame
Month Days
0 January 31
1 February 28
2 March 31
Creating a DataFrame from a dict of lists:
While creating a DataFrame in Pandas from a dictionary oftists all the ists within the dictionary have to be of
the same length. If the index is also passed while creating the DataFrame, then the length of the index
should also be equal to the length of the lists in the dictionary. Ifthe index is not passed, the index of the
DataFrame will be range(n) by default, where n is the length of each list in the dictionary.
hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, ane318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners
The keys of the dictionary become the column names of the DataFrame and their values, which are lists from
the rows and columns.
Anport
pandas as pd
o 4 create dict of lists
dict_of list:
= (Students? :['Alan’", ‘Vivian’, ‘Alister’, "Dade?"],
Age’ :[24, 26, 32, 23])
af = pd.bataFrame(dict_of lists)
a at contains the following data:
Students Age
0 Alan 24
1 Vian 26
2 Alister 32
3 Jade 29
Creating an index DataFrame from a dict of lists:
Indices of a DataFrame are not restricted to numbering and can be specified as follows
Inport pandas as pd
f create dict of lists
dict_of Mists = (/Students?:['Alan’’, ‘Vivian’, ‘Alister’, "Jade?"),
Age’ :[24, 26, 32, 291)
4 creating the DataFrane
af = pé.bataFrane(dict_of_lists, index =[
ent,
Student3',
student")
In such a case, the DataFrame looks like:
hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, 518318/24, 1:20PM
Pytnon Pandas Dataframe Tutorial for Beginners
Students Age
Studentt Alan 24
Student! Vivian 26
Student2 Alister 92
Student3 Jade 29
Creating a DataFrame from a list of dicts
DataFrames in Pandas can be created with a lst of dictionaries. The keys of the dictionaries are taken as the
column names by default,
‘npor
andas 2 pd
4 create a List of dictionaries
List_ofdicts = [{*coluan_a': 1, ‘colum_b': 2, ‘colum_c':3), {"colunn_a':18, ‘colum_p': 28, ‘colum_c': 38}]
4 creating the Datarrane:
4F = pé.vataFrane(2ist_of diets)
‘The above snippet generates the following DataFrame
column_a column_b column_¢
If some of the values are missing in the dictionary, lke in the code snippet below:
snpor
andas 2 pd
4 create a List of dictionaries
List_of
+ [Ceolunn a": 4, "coluen_¢':3), ("colunn_a':18, ‘colum_p': 28, ‘colum_c': 38}]
1 creating the DataFrane
4F = pé.bataFrame(1ist_of dicts)
Then af will contain the following DataFrame,
column_a column_p | column_c
o4 NaN 3
1 10 20 30
hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, ene318/24, 1:20PM
Pytnon Pandas Dataframe Tutorial for Beginners
Creating a DataFrame from a list of dicts and specifying the
row indices.
fnport pandas as pd
4 create 2 List of dictionaries
List_of d (‘coluan_a': 4,?coluen_p?: 2, ‘colunn_c':3}, (colunn_a':1@, “coluen_b*: 22,
30)
eo
ng the ataFrane.
4F = pé.vataFrane(List_of dicts, index = [*row", ‘row_2*])
‘Then af will contain the following DataFrame,
column_a —column_b —column_e
rowl 1 NaN 3
row2 10 20 30
Creating a DataFrame from a list of dicts and specifying both
the row indices and the column indices
‘The names specified in the column list have to match the keys of the dictionary. If there is no match, the rows,
corresponding to that particular column will contain NaN.
Inport pandas as pd
4 create a List of dictionaries
List_of dt (coluan_a's 1, ‘colum_c':3), (‘column_a':18, ‘column p': 20, ‘coluan_e': 3@)]
4 creating the DataFrane,
af = pé.vataFrane(Iist_of dicts, index = [‘row’, ‘row2*], column = [colunn a’, ‘colunn_c”])
‘Then af will contain the following DataFrame,
column_a—column_¢
rowl 1 3
row2 10 30
“column_b' here does not get added to the DataFrame since itis not mentioned in the column list while
creating the DataFrame.
Consider the following code:
Inport pandas as pd
4 create a List of dictionaries
List_of ¢ ‘coluan_a': 1, ‘column c':3), (‘column_s':18, “column p': 20, ‘coluan_e': 3@)]
hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, 78318/24, 1:20PM
Pytnon Pandas Dataframe Tutorial for Beginners
4 creating the DataFrane:
4F = pa.oatarrane(List_of dicts, index = ‘row’, ‘row2*], column = [colunn a’, ‘coluan4”])
Since colurmn_d is not a key in either of the dictionaries, the DataFrame generated looks like:
column_a — column_d
rowl 1 NaN
row2 10 NaN
Creating a DataFrame froma
Inport
st of tuples:
pandas as pd
4 create 2 List of tuples
Hist_of tuples = [(8, "August? ,1998),(2, ‘January’,1987 ),(17, ‘uly’, 2621), (24, “June? ,1932)
# creating
1e DataFrane
4F = pé.vataFrane(2ist_of tuples, column = [‘oate’, ‘Yonth*, “Year’])
Will generate the DataFrame df:
Date Month Year
o 8 August 1998
1 2 January 1987
207 July 2024
3 June 1932
Creating a DataFrame using the zip() function:
In Python, the zip() function can be used to merge two lists. The zip() function generates a zip object. The zip
‘object isan iterator of tuples, where the items in each ofthe iterators passed to the zip function are paired
together, ie first item of frst iterator is paired with fist item of the second iterator, the second item of the frst
iterators paired with the second item of the second iterator and so on. i the iterators passed to the 7ip()
function vary in length, the length of the zip operator is determined by the iterator of least length.
Inport pandas as pd
fust a
fist 2
age = [24, 26, 32, 23]
hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, ane318/24, 1:20PM
Pytnon Pandas Dataframe Tutorial for Beginners
# using zip to merge the two Lists
List_of_tuples = list(zip(stugents, age))
4 List(zip(students, age)) will return
4 [CAlan*, 24), ("Wivian’, 26), (%Alister’, 32),("Jade", 29))
4 Converting the Lists of tuples into pandas Datafrane:
4F = pé.bataFrane(1ist of tuples,
colums = [/Students’, "age'])
Here, df will contain:
Students Age
0 Abn 24
+ Vivian 26
2° Alster 92
3 Jade 29
Creating an empty DataFrame
import pandas 2s pd
af = pd.datarrane()
‘The above code will create an empty DataFrame in Python Pandas,
To create an empty DataFrame with the column headers:
import pandas 2s od
af = pd.oatarrane( [colunna’ ,*coluan2?, ‘columns?
Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects
2. How to sort rows within a pandas dataframe
Many times in data analysis you will need to get a sense of the data and its magnitude, Sorting rows enables
this. The df.sort_values()function enables this and sorts by columns that are passed as parameters to the
function
For example the following command sorts the dataframe by the “reports” column in descending order
df.sort_values(by='reports’, ascending=0)
‘The following command sorts the dataframe by the “reports" column in ascending order
‘df.sort_values(by="reports', ascending=1)
The following command sorts the datalrame fist by the “coverage” column and then by the “reports” column
d-sort_values(by=[coverage’, ‘reports'))
hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, one318/24, 1:20PM
Pytnon Pandas Dataframe Tutorial for Beginners
Example Tutorial
Check out this pancias dataframe example to see how various ways to sort rows inside a dataframe.
3. How to find the largest value in a pandas
dataframe
In the data exploratory stage of analytics, you will occasionally want to get a sense of the largest values in
your dataset. This tells you directionally the shape of your data, what operations to perform on the data and
what visualisation might look like.
‘The idxmax() function returns the index of the row with the highest value in your dataframe, The idxmin()
function retums the index of the row with the lowest value in your dataframe.
‘When used like this - dffpreTestScore']idxmax()- it means that this command will return the index of the row
that contains the maximum value for column "preTestScore" in your datafram (df)
Example Tutorial
Check out this pandas dataframe example to see how to find the largest value in a dataframe.
Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects
4. How to list unique value in a pandas dataframe
Finding unique values in a dataset is useful in many scenarios -to categorize the number of rows belonging
to a specific entity, to find the most popular and least popular entities etc.
The following command lists the unique values in the “name” column of the dataframe
diname.unique()
Example Tutorial
Check out this pandas dataframe example to see how to find unique values ina dataframe
5. How to delete duplicates from a pandas
dataframe
Deleting duplicate values largely serves the purpose of reducing memory usage of your dataset. It could also
be used if you don't want a specific value to be over represented in your dataset.
) Geen tearing
\
Data Science Projectsin Retall& Ecommerce) (‘Bata Science Project in Entertainment & Meco
Neural Netw
7. Search pandas dataframe for a value
The following code finds all value sof Age where salary > 50,000, The where function helps to search a
pandas dataframe for a value
print(af{Age'].where(af[Satary| > 50000))
Example Tutorial
Check out this data science tutorial to see an example of how to search for a value in a pandas dataframe.
8. Drop row and column in a pandas dataframe
Many times in data analysis you wil have to delete rows and columns that don’ ft our modeling needs. The
df.drop()helps achieve this.
hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, se318/24, 1:20PM
Pyton Pandas Datatrare Tutorial fr Beginners
<.drop( reports, axis=1)
will drop a column names “reports. Axis=1 indicates that we are referring to a column and not a row.
‘You can also drop columns based on coditions
df.dropldtiname = Tina’)
will drop a row where the value of ‘name’ is not ‘Tina’
Example Tutorial
Check out this code recipe to see an example of how to drop row and columns in a pandas datafame.
9. Replace multiple values in a pandas dataframe
While data munging, you might inherit a dataset with fts of nul value, junk values, duplicate values et. In
such instances you wil need to replace thee values in bulk
The df replace(jfunction helps to replace values in a pandas dataframe. This funcation can be used to
replace a string, regex, list, dictionary, series, number etc. in a dataframe
‘df.replace(-999, np.nan)
will replace all occurrences of 999 with nan nul values.
‘df.replace(to_replace =["Tennis", “Cricket"], value ="Sports")
will replace the values ‘Tennis’ and ‘cricket’ with the value ‘Sports
Example Tutorial
Check out this code recipe to see an example of how to replace multiple values in a pandas dataframe.
10. Save pandas dataframe as a .csv file
[As you must have noticed from the above functions, pandas is @ very powertul library for data cleaning and
preparation.
Once you are done with the various data manipulations using the above commands, you will need to convert
your dataframe into a sv fle. This is needed to spit your data into training and test data for model building
and accuracy checking
‘The df.to_csv()function converts a pandas dataframe into a .csv file format.
df.to_csv(r'C:\Users\Admin\Desktop\file3.csv, index=False)
will store the .csv in a specific solution
Example Tutorial
Check out this code recipe to see an example of how to save a pandas dataframe as a csv file
Tl. Randomly sample a pandas dataframe
Trying to understand a dataset involves getting a quick insight into what type and range of data it contains.
Pandas provides functions to pick random values from the dataset
<étake(nprandom.permutation(len(a){2))
this code snippet picks 2 values at random
di take(op random permutation(len(a:2))
this code snippet picks 4 values at random
Example Tutorial
Check out this Pandas tutorial en how to randomly sample a pandas dataframe.
hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, vate318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners
12. How to filter in a pandas dataframe
Filtoring a dataframe enables you to view specific rows and columns ether based on order or matching
specific conditions
print(dff:2])
will print th frst 2 rows in the dataframe.
print(di{(df{'coveragel] > 50) & (difreports] < 4)))
will print rows where the column ‘coverage’ is greater than 50 and the column ‘raports' is greater than 4.
Example Tutorial
Check out this data science tutorial on how to filter in a pandas dataframe
Get More Practice, More Data Science and Machine Learning Projects, and More guidance.Fast-Track
Your car
1 Transition with ProjectPro:
13. How to calculate moving average in a pandas
dataframe
As part of data munging, you have to try to understand the trends in your dataset, But when your data values
are very spikey its tought to spot trends.
Calculating a moving average lke a 7-day average helps to smoothen out the data variability and gives you a
directional trend.
The dataframe.rolling() provides the rolling window calculation and by adding the ‘mean’ parameter to this
function, the average can be calculated,
ft = dilfpreTestScore’postTestScore|}.oling(window=2).mean()
this calculates a moving average with a window of 2 on the columns ‘preTestScore’ and ‘postTestScore'. A
window of 2 means, the next 2 consecutive values are averaged and this happens for the entire dataframe,
Example Tutorial
Check out this data science tutorial on how to calculate moving average in a pandas dataframe
14. How to normalise a column in a pandas
dataframe
In the data munging step of your data science project, you wil often times get data with wide variability
across positive and negative values. Normalisation is done to reduce the data range when data of different
scales are involved,
Normalising a dataset (234,24,14) would result in (1, 0.31,0.28). Using 234 as the anchor value all other
values are represented relative to 234),
hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, 1318318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners
Example Tutorial
Check out this data science tutorial on how to normalise a column in a pandas dataframe
15. How to assign new columns in a pandas
dataframe
‘There are a couple of reasons why you might want to add new columns during data pracessing.You might
have data in 2 different data frames that you want to bring into a single data frame, Or you might want to add
a new column that is a result of a function on 2 or more other columns.
‘There are multiple ways to add new columns in a pandas dataframe - by declaring a new list as a column, by
Using dataframe insert(), by using dataframe.assign(), by using a dictionary.
‘The dataframe.assign() function will add a new column at the end of the dataframe by default. You cannot
specify in which position to add this column. For that you will need to use the dataframe.insert()
f= dLassign(Marks = (71, 82, 89))
will add a new columnd "Marks" with the values 71, 82,89 as the last column in the dataframe.
Example Tutorial
Check out this data science recipe on how to assign new columns in a pandas dataframe
Access Data Si
ynce and Machine Learning Project Code Examples
16. How to rank a pandas dataframe in ascending
and descending order
hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, sate318/24, 1:20PM
Pytnon Pandas Dataframe Tutorial for Beginners
By now you must have realised that Python is an excellent language to do data analysis. This is primarily
because of the powerful data analytical packages like pandas that python provides.
Ranking a pandas dataframe returns a rank for every index (row) in the series passed to the function. Both
numeric and string values can be ranked by the df.rank()
dicoverageRanked!] = dif'coverage'}.rank(ascending=True)
this function will create a new columns ‘coverageRanked’ and assign to i ascendingt ranks of the values in
the ‘coverage’ column
Example Tutoria
Check out this data science tutorial on how to rank a pandas dataframe
17) Add row to a DataFrame
There are several ways to add a row or rows to an existing DataFrame in Python Pandas.
Adding a single row using the DataFrame.loc() function.
‘To add the row at the end of the DataFrame, the length of the DataFrame has to be found to determine the
position at which the new row is to be added.
Inport pandas 95 od
‘feon nunpy.randon inport randint
dict = (Student? :[*Beter?, ‘Janes’, ‘Ella, ‘Charlotte’,
“age” :128,26,35,271,
tajor‘:{ ‘Chentstry", Biology’, Physics?,"chenistry"]
‘creating a DataFrane from the dict of lists
4F = pé.bataFrane(dict)
Here, df would look like this:
Student Age Major
0 Peter 28 Chemistry
1 James 24 Biology
2 Ella 35 Physics
3 Charlotte 27 Chemistry
adding anew row
4F 1ocLlen(dF.index)] = ['Mike’, 33, ‘Physics’
Now, df would ook like:
0 Peter 28 Chemistry
1 James 24 Biology
2 Ella 35 Physics
hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, ase318/24, 1:20PM
Pytnen Pandas Dataframe Tutorial for Beginners
3 Charlotte 2 Chemistry
4 Mike 33 Physics
Using the DataFrame.append() function
‘The DataFrame.append( function in Python Pandas may be used to append a single row orto append
rutile rows belonging to another DataFrame to the end of a particular DataFrame and return a new
DataFrame object inthe process, Any columns which are not present inthe original DataFrame are added as
new columns. The new cells created inthe original DataFrame get populated with NaN
‘The syntax for the appendi) function is as follows:
alse, verfy_integrty=False, so
DataFrame.append(other, ignore_inde» fone)
where:
‘other: the list of rows to be appended, or a DataFrame object or dictionary object of the rows to be
appended.
ignore_index : takes True or False; default is false. If set to True, the index labels are not used.
verify_integrity : takes True or False; default is false. If set to True, ValueError gets raised on creating
indexes with duplicates,
sort: sorts the columns if the columns of the original DataFrame and the new rows are not aligned. sort=True
Is used to silence the warning and sort. sort=False results in silencing the warning and nat sorting
Returns: DataFrame object with appended rows.
Using append() to add a single row:
nport pandas as pd
from nuspy.randon inport randint
dict = (¢Student?:['Peter?, ‘Janes?, ‘Ella’, ‘Charlotte’,
‘age? :28,24,35,271,
Major’: [‘Chenistry”,"Biology’, "Physics? chemistry" ]
4F = pé.bataFrane(aict)
ew row = (/student?: "Mike', “Age’: 29, "Major: ‘Biology")
4F = dF append(af2, Ignore _index = True)
Using append() to add the rows from a new DataFrame to an existing DataFrame,
Inport pandas 9s pd
4+ First DataFrane
42 = pd.oatarrane(("foo":T1, 2, 3, a1,
“bars[5, 6) 7, 81D)
1+ second oatafrane
42 = pd.oa
Frane({"fo0":[9, 8, 71,
bar*:[5, 4, 3})
an
foo bar
hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, rete318/24, 1:20PM
at
ntps:www projector joartcelpytnon-pandas-dataframe-ttorials/405 eit