How to select last row and access PySpark dataframe by index ?
Last Updated :
22 Jun, 2021
In this article, we will discuss how to select the last row and access pyspark dataframe by index.
Creating dataframe for demonstration:
Python3
# importing module
import pyspark
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of students data
data = [["1","sravan","vignan"],
["2","ojaswi","vvit"],
["3","rohith","vvit"],
["4","sridevi","vignan"],
["1","sravan","vignan"],
["5","gnanesh","iit"]]
# specify column names
columns = ['student ID','student NAME','college']
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data,columns)
# show dataframe
dataframe.show()
Output:

Select last row from dataframe
Example 1: Using tail() function.
This function is used to access the last row of the dataframe
Syntax: dataframe.tail(n)
where
- n is the number of rows to be selected from the last.
- dataframe is the input dataframe
We can use n = 1 to select only last row.
Example 1: Selecting last row.
Python3
# access last row of the dataframe
dataframe.tail(1)
Output:
[Row(student ID='5', student NAME='gnanesh', college='iit')]
Example 2: Python program to access last N rows.
Python3
# access last 5 rows of the
# dataframe
dataframe.tail(5)
Output:
[Row(student ID='2', student NAME='ojaswi', college='vvit'),
Row(student ID='3', student NAME='rohith', college='vvit'),
Row(student ID='4', student NAME='sridevi', college='vignan'),
Row(student ID='1', student NAME='sravan', college='vignan'),
Row(student ID='5', student NAME='gnanesh', college='iit')]
Access the dataframe by column index
Here we are going to select the dataframe based on the column number. For selecting a specific column by using column number in the pyspark dataframe, we are using select() function
Syntax: dataframe.select(dataframe.columns[column_number]).show()
where,
- dataframe is the dataframe name
- dataframe.columns[]: is the method which can take column number as an input and select those column
- show() function is used to display the selected column
Example 1: Python program to access column based on column number
Python3
# select column with column number 1
dataframe.select(dataframe.columns[1]).show()
Output:
+------------+
|student NAME|
+------------+
| sravan|
| ojaswi|
| rohith|
| sridevi|
| sravan|
| gnanesh|
+------------+
Example 2: Accessing multiple columns based on column number, here we are going to select multiple columns by using the slice operator, It can access upto n columns
Syntax: dataframe.select(dataframe.columns[column_start:column_end]).show()
where: column_start is the starting index and column_end is the ending index.
Python3
# select column with column number slice
# operator
dataframe.select(dataframe.columns[0:3]).show()
Output:
Similar Reads
How to select a range of rows from a dataframe in PySpark ? In this article, we are going to select a range of rows from a PySpark dataframe. It can be done in these ways: Using filter().Using where().Using SQL expression. Creating Dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pys
3 min read
How to select and order multiple columns in Pyspark DataFrame ? In this article, we will discuss how to select and order multiple columns from a dataframe using pyspark in Python. For this, we are using sort() and orderBy() functions along with select() function. Methods UsedSelect(): This method is used to select the part of dataframe columns and return a copy
2 min read
How to slice a PySpark dataframe in two row-wise dataframe? In this article, we are going to learn how to slice a PySpark DataFrame into two row-wise. Slicing a DataFrame is getting a subset containing all rows from one index to another. Method 1: Using limit() and subtract() functions In this method, we first make a PySpark DataFrame with precoded data usin
4 min read
How to See Record Count Per Partition in a pySpark DataFrame The API which was introduced to support Spark and Python language and has features of Scikit-learn and Pandas libraries of Python is known as Pyspark. Whenever we upload any file in the Pyspark, it creates a partition of that data equal to the number of cores. The user can repartition that data and
4 min read
How to Iterate over rows and columns in PySpark dataframe In this article, we will discuss how to iterate rows and columns in PySpark dataframe. Create the dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app nam
6 min read
How to Convert Pandas to PySpark DataFrame ? In this article, we will learn How to Convert Pandas to PySpark DataFrame. Sometimes we will get csv, xlsx, etc. format data, and we have to store it in PySpark DataFrame and that can be done by loading data in Pandas then converted PySpark DataFrame. For conversion, we pass the Pandas dataframe int
3 min read