0% found this document useful (0 votes)
3 views11 pages

Basic DataFrame Operation

The document provides an overview of basic DataFrame operations in Databricks using Spark SQL with Python. It covers creating SparkSessions, constructing DataFrames from various data structures (lists, dictionaries, RDDs), reading external files, and performing basic operations like displaying data, filtering, and selecting specific columns. Additionally, it explains the use of methods like show(), display(), and printSchema() for inspecting DataFrames.

Uploaded by

lathakaruna493
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views11 pages

Basic DataFrame Operation

The document provides an overview of basic DataFrame operations in Databricks using Spark SQL with Python. It covers creating SparkSessions, constructing DataFrames from various data structures (lists, dictionaries, RDDs), reading external files, and performing basic operations like displaying data, filtering, and selecting specific columns. Additionally, it explains the use of methods like show(), display(), and printSchema() for inspecting DataFrames.

Uploaded by

lathakaruna493
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

6/11/25, 5:54 PM Dataframe Basic Operations - Databricks

Dataframe Basic Operations (Python)

Import notebook

Creating a SparkSession
The SparkSession is the entry point to programming with Spark SQL.

It allows you to create DataFrames, register DataFrames as tables, execute SQL over tables, cache tables, and read
parquet files.

SparkSession.builder: The builder attribute is a class attribute of SparkSession that provides a way to configure
and create a SparkSession instance.

appName("Example App"): The appName method sets the name of the Spark application. This name will appear
in the Spark web UI and can help you identify your application among others running on the same cluster.

config("spark.some.config.option", "some-value"): The config method allows you to set various configuration
options for the Spark session. In this example, " spark.some.config.option " is a placeholder for an actual
configuration key, and "some-value" is the value for that configuration. You can set multiple configuration options
by chaining multiple config calls.

getOrCreate(): The getOrCreate method either retrieves an existing SparkSession if one already exists or creates a
new one if it does not. This ensures that you do not accidentally create multiple SparkSession instances in your
application.

Note:In Databricks, you do not need to create or override the SparkSession as it is automatically created for each
notebook or job executed against the cluster. Databricks manages the SparkSession and SparkContext for you,
ensuring optimal configuration and resource usage.

from pyspark.sql import SparkSession


spark = SparkSession.builder.appName("Spark DataFrames").config("spark.some.config.option", "some-
value").getOrCreate()

Creating DataFrame
1.From Python a List of Tuples

%python
# List of tuples
data = [("John", 25), ("Doe", 30), ("Jane", 22)]

# Creating DataFrame
df_list = spark.createDataFrame(data, ["Name", "Age"])

# Display the DataFrame


df_list.show()

file:///C:/Users/esidhannara/Downloads/Dataframe Basic Operations (1).html 1/11


6/11/25, 5:54 PM Dataframe Basic Operations - Databricks

;   df_list: pyspark.sql.dataframe.DataFrame = [Name: string, Age: long]


+----+---+
|Name|Age|
+----+---+
|John| 25|
| Doe| 30|
|Jane| 22|
+----+---+

2.From a List of Dictionaries

%python
# List of dictionaries
data = [{"Name": "Alice", "Id": 1}, {"Name": "Bob", "Id": 2}, {"Name": "Cathy", "Id": 3}]

# Creating DataFrame
df_dict = spark.createDataFrame(data)

# Display the DataFrame


df_dict.show()

  df_dict: pyspark.sql.dataframe.DataFrame = [Id: long, Name: string]


+---+-----+
| Id| Name|
+---+-----+
| 1|Alice|
| 2| Bob|
| 3|Cathy|
+---+-----+

3.From a List of Rows

%python
from pyspark.sql import Row

# List of Rows
data = [ Row(Name="Cathy", Id=1),
Row(Name="David", Id=2),
Row(Name="Eva", Id=3),
Row(Name="Frank", Id=4)]

# Creating DataFrame
df_row = spark.createDataFrame(data)

# Display the DataFrame


df_row.show()

  df_row: pyspark.sql.dataframe.DataFrame = [Name: string, Id: long]

+-----+---+
| Name| Id|
+-----+---+

file:///C:/Users/esidhannara/Downloads/Dataframe Basic Operations (1).html 2/11


6/11/25, 5:54 PM Dataframe Basic Operations - Databricks

|Cathy| 1|
|David| 2|
| Eva| 3|
|Frank| 4|
+-----+---+

4.Creating a DataFrame from an RDD

%python
# Import necessary modules
from pyspark.sql import Row

# Create an RDD
rdd = spark.sparkContext.parallelize([
Row(Name="Alice", Age=25),
Row(Name="Bob", Age=30),
Row(Name="Cathy", Age=22),
Row(Name="David", Age=35),
Row(Name="Eva", Age=28),
Row(Name="Frank", Age=40)
])

# Convert RDD to DataFrame


df_rdd = spark.createDataFrame(rdd)

# Display the DataFrame


df_rdd.show()

  df_rdd: pyspark.sql.dataframe.DataFrame = [Name: string, Age: long]

+-----+---+
| Name|Age|
+-----+---+
|Alice| 25|
| Bob| 30|
|Cathy| 22|
|David| 35|
| Eva| 28|
|Frank| 40|
+-----+---+

5.Reading external file


spark.read: This is the entry point for reading data in Spark. It returns a DataFrameReader object that is used to read
data from various sources.

.format("csv"): Specifies the format of the data source. In this case, it indicates that the data is in CSV (Comma-
Separated Values) format.

.option("header", "true"): This option tells Spark that the first row of the CSV file contains the column names. If this
option is set to false, Spark will treat the first row as data. "true" means that the CSV file has a header row.

file:///C:/Users/esidhannara/Downloads/Dataframe Basic Operations (1).html 3/11


6/11/25, 5:54 PM Dataframe Basic Operations - Databricks

.option("inferSchema", "true"): This option tells Spark to automatically infer the data types of each column in the
CSV file. If this option is set to false, all columns will be read as strings (default behavior). "true" means that Spark will
try to infer the schema (data types) of the columns based on the data.

.load("/FileStore/tables/retail_db/customers"):

This method specifies the path to the CSV file or directory containing CSV files that you want to read.

customer_df=spark.read.format("csv").option("header","true").option("inferSchema","true").load("dbfs:/FileStor
e/tables/customers_300mb.csv")

  customer_df: pyspark.sql.dataframe.DataFrame = [customer_id: integer, name: string ... 5 more fields]

6. Using StructType & StructField

%python
#employee data and schemas
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, FloatType, DateType
from datetime import date

# Create dummy data as a list of lists


emp_data = [
[1, 101, "John Doe", 30, "M", 60000.0, date(2020, 1, 15)],
[2, 102, "Jane Smith", 25, "F", 65000.0, date(2019, 3, 10)],
[3, 101, "Mike Johnson", 35, "M", 70000.0, date(2018, 5, 20)],
[4, 103, "Emily Davis", 28, "F", 72000.0, date(2021, 7, 30)],
[5, 102, "Robert Brown", 40, "M", 80000.0, date(2017, 9, 25)],
[6, 101, "Linda Wilson", 32, "F", 68000.0, date(2020, 11, 5)],
[7, 103, "David Lee", 29, "M", 75000.0, date(2019, 12, 15)]]

# Define the schema


emp_schema = StructType([
StructField("empid", StringType(), True),
StructField("deptid", IntegerType(), True),
StructField("name", StringType(), True),
StructField("age", IntegerType(), True),
StructField("gender", StringType(), True),
StructField("salary", FloatType(), True),
StructField("hiredate", DateType(), True)
])
# Create DataFrame
df = spark.createDataFrame(emp_data, emp_schema)
#df = spark1.createDataFrame(data = emp_data, schema = emp_schema)

# Display the DataFrame


df.show()

  df: pyspark.sql.dataframe.DataFrame = [empid: string, deptid: integer ... 5 more fields]


+-----+------+------------+---+------+-------+----------+
|empid|deptid| name|age|gender| salary| hiredate|
+-----+------+------------+---+------+-------+----------+
| 1| 101| John Doe| 30| M|60000.0|2020-01-15|
| 2| 102| Jane Smith| 25| F|65000.0|2019-03-10|
| 3| 101|Mike Johnson| 35| M|70000.0|2018-05-20|

file:///C:/Users/esidhannara/Downloads/Dataframe Basic Operations (1).html 4/11


6/11/25, 5:54 PM Dataframe Basic Operations - Databricks

| 4| 103| Emily Davis| 28| F|72000.0|2021-07-30|


| 5| 102|Robert Brown| 40| M|80000.0|2017-09-25|
| 6| 101|Linda Wilson| 32| F|68000.0|2020-11-05|
| 7| 103| David Lee| 29| M|75000.0|2019-12-15|
+-----+------+------------+---+------+-------+----------+

Basic DataFrame Operation


1. show() & display()
In Databricks, show() and display() are used to visualize DataFrames, but they have different functionalities:

show(): This is a method available on Spark DataFrames that prints the first n rows to the console. It is useful for
quick inspection of data but does not provide rich formatting or interactivity. You can specify the number of rows to
display, and it defaults to 20 rows if not specified.

display(): This is a Databricks-specific function that provides a rich, interactive view of the DataFrame. It is more
suitable for use within notebooks as it allows for better visualization, including sorting, filtering, and graphical
representation of data.

customer_df.show(5)

+-----------+----------+------+-----------+-------+-----------------+---------+
|customer_id| name| city| state|country|registration_date|is_active|
+-----------+----------+------+-----------+-------+-----------------+---------+
| 0|Customer_0| Pune|Maharashtra| India| 2023-01-19| true|
| 1|Customer_1| Pune|West Bengal| India| 2023-08-10| true|
| 2|Customer_2| Delhi|Maharashtra| India| 2023-08-05| true|
| 3|Customer_3|Mumbai| Telangana| India| 2023-06-04| true|
| 4|Customer_4| Delhi| Karnataka| India| 2023-03-15| false|
+-----------+----------+------+-----------+-------+-----------------+---------+
only showing top 5 rows

customer_df.display()

#display(customer_df)

Table

file:///C:/Users/esidhannara/Downloads/Dataframe Basic Operations (1).html 5/11


6/11/25, 5:54 PM Dataframe Basic Operations - Databricks

2. Columns & Prinschema()


In Spark, columns and printSchema() are used to inspect the structure of a DataFrame, but they serve different
purposes:

columns: This attribute returns a list of the column names in the DataFrame.

printSchema(): This method prints the schema of the DataFrame, including column names and data types, in a
tree format.

customer_df.columns

['customer_id',
'name',
'city',
'state',
'country',
'registration_date',
'is_active']

customer_df.printSchema()

root
|-- customer_id: integer (nullable = true)
|-- name: string (nullable = true)
|-- city: string (nullable = true)
|-- state: string (nullable = true)
|-- country: string (nullable = true)
|-- registration_date: date (nullable = true)
|-- is_active: boolean (nullable = true)

3. Select specific columns

customer_df.select("name","city").show()

file:///C:/Users/esidhannara/Downloads/Dataframe Basic Operations (1).html 6/11


6/11/25, 5:54 PM Dataframe Basic Operations - Databricks

| Customer_5| Kolkata| 
| Customer_6| Kolkata|
| Customer_7| Mumbai|
| Customer_8| Pune|
| Customer_9| Delhi|
|Customer_10|Hyderabad|
|Customer_11| Delhi|
|Customer_12| Delhi|
|Customer_13| Pune|
|Customer_14| Chennai|
|Customer_15|Hyderabad|
|Customer_16| Chennai|
|Customer_17| Pune|
|Customer_18| Chennai|
|Customer_19| Chennai|
+-----------+---------+
only showing top 20 rows 

4. Filter rows

customer_df.filter(customer_df.city=="Hyderabad").show()

| 21| Customer_21|Hyderabad| Tamil Nadu| India| 2023-09-16| true|



| 25| Customer_25|Hyderabad|West Bengal| India| 2023-08-22| true|
| 34| Customer_34|Hyderabad| Telangana| India| 2023-10-20| true|
| 37| Customer_37|Hyderabad| Gujarat| India| 2023-03-13| false|
| 38| Customer_38|Hyderabad| Karnataka| India| 2023-06-19| false|
| 40| Customer_40|Hyderabad|Maharashtra| India| 2023-07-29| false|
| 44| Customer_44|Hyderabad| Telangana| India| 2023-08-18| false|
| 84| Customer_84|Hyderabad|Maharashtra| India| 2023-04-08| false|
| 100|Customer_100|Hyderabad|Maharashtra| India| 2023-12-30| false|
| 110|Customer_110|Hyderabad|Maharashtra| India| 2023-03-14| false|
| 118|Customer_118|Hyderabad| Gujarat| India| 2023-01-27| false|
| 134|Customer_134|Hyderabad|West Bengal| India| 2023-06-25| true|
| 137|Customer_137|Hyderabad| Tamil Nadu| India| 2023-03-11| true|
| 138|Customer_138|Hyderabad| Delhi| India| 2023-12-26| true|
| 149|Customer_149|Hyderabad| Karnataka| India| 2023-09-21| false|
| 150|Customer_150|Hyderabad|Maharashtra| India| 2023-11-10| false|
| 171|Customer_171|Hyderabad|West Bengal| India| 2023-12-24| true|
| 173|Customer_173|Hyderabad| Gujarat| India| 2023-05-30| false|
+-----------+------------+---------+-----------+-------+-----------------+---------+
only showing top 20 rows 

customer_df.where(customer_df.city=="Hyderabad").show()

| 21| Customer_21|Hyderabad| Tamil Nadu| India| 2023 09 16| true|


| 25| Customer_25|Hyderabad|West Bengal| India| 2023-08-22| true| 
| 34| Customer_34|Hyderabad| Telangana| India| 2023-10-20| true|
| 37| Customer_37|Hyderabad| Gujarat| India| 2023-03-13| false|
| 38| Customer_38|Hyderabad| Karnataka| India| 2023-06-19| false|

file:///C:/Users/esidhannara/Downloads/Dataframe Basic Operations (1).html 7/11


6/11/25, 5:54 PM Dataframe Basic Operations - Databricks

| 134|Customer_134|Hyderabad|West Bengal| India| 2023-06-25| true| 


| 137|Customer_137|Hyderabad| Tamil Nadu| India| 2023-03-11| true|
| 138|Customer_138|Hyderabad| Delhi| India| 2023-12-26| true|
| 149|Customer_149|Hyderabad| Karnataka| India| 2023-09-21| false|
| 150|Customer_150|Hyderabad|Maharashtra| India| 2023-11-10| false|
| 171|Customer_171|Hyderabad|West Bengal| India| 2023-12-24| true|
| 173|Customer_173|Hyderabad| Gujarat| India| 2023-05-30| false|
+-----------+------------+---------+-----------+-------+-----------------+---------+
only showing top 20 rows

5. Create or replace new column

The withColumn method is used to create a new column or replace an existing column in a DataFrame.

df.withColumn("name","defination")

%python
from pyspark.sql.functions import col, concat, lit

# col: A function to reference a column in a DataFrame.


# concat: A function to concatenate multiple columns or strings.
# lit: A function to create a column with a literal value.

# Example: Adding a new column


df_with_new_column = customer_df.withColumn("full name", concat(col("name"), lit(" Singh")))

# Display the DataFrame


df_with_new_column.show()

  df_with_new_column: pyspark.sql.dataframe.DataFrame = [customer_id: integer, name: string ... 6 more fields]


| 2| Customer_2| Delhi|Maharashtra| India| 2023-08-05| true| Customer_2 Singh|

| 3| Customer_3| Mumbai| Telangana| India| 2023-06-04| true| Customer_3 Singh|
| 4| Customer_4| Delhi| Karnataka| India| 2023-03-15| false| Customer_4 Singh|
| 5| Customer_5| Kolkata|West Bengal| India| 2023-08-19| true| Customer_5 Singh|
| 6| Customer_6| Kolkata| Tamil Nadu| India| 2023-04-21| false| Customer_6 Singh|
| 7| Customer_7| Mumbai| Telangana| India| 2023-05-23| true| Customer_7 Singh|
| 8| Customer_8| Pune| Tamil Nadu| India| 2023-07-17| true| Customer_8 Singh|
| 9| Customer_9| Delhi| Karnataka| India| 2023-06-02| true| Customer_9 Singh|
| 10|Customer_10|Hyderabad| Delhi| India| 2023-02-23| true|Customer_10 Singh|
| 11|Customer_11| Delhi|West Bengal| India| 2023-11-08| true|Customer_11 Singh|
| 12|Customer_12| Delhi| Delhi| India| 2023-06-27| false|Customer_12 Singh|
| 13|Customer_13| Pune|Maharashtra| India| 2023-02-03| true|Customer_13 Singh|
| 14|Customer_14| Chennai| Karnataka| India| 2023-04-06| true|Customer_14 Singh|
| 15|Customer_15|Hyderabad|West Bengal| India| 2023-03-31| true|Customer_15 Singh|
| 16|Customer_16| Chennai|Maharashtra| India| 2023-04-26| true|Customer_16 Singh|
| 17|Customer_17| Pune| Delhi| India| 2023-04-14| false|Customer_17 Singh|
| 18|Customer_18| Chennai|Maharashtra| India| 2023-02-04| false|Customer_18 Singh|
| 19|Customer_19| Chennai| Karnataka| India| 2023-01-22| true|Customer_19 Singh|
+-----------+-----------+---------+-----------+-------+-----------------+---------+-----------------+
only showing top 20 rows 

withColumnRenamed
The withColumnRenamed method is used to rename a single column in a DataFrame.

file:///C:/Users/esidhannara/Downloads/Dataframe Basic Operations (1).html 8/11


6/11/25, 5:54 PM Dataframe Basic Operations - Databricks

%python
# Example: Renaming a column
df_renamed_column = df_with_new_column.withColumnRenamed("full name", "Full Name")

# Display the DataFrame


df_renamed_column.show()

  df_renamed_column: pyspark.sql.dataframe.DataFrame = [customer_id: integer, name: string ... 6 more fields]


| 2| Customer_2| Delhi|Maharashtra| India| 2023-08-05| true| Customer_2 Singh|

| 3| Customer_3| Mumbai| Telangana| India| 2023-06-04| true| Customer_3 Singh|
| 4| Customer_4| Delhi| Karnataka| India| 2023-03-15| false| Customer_4 Singh|
| 5| Customer_5| Kolkata|West Bengal| India| 2023-08-19| true| Customer_5 Singh|
| 6| Customer_6| Kolkata| Tamil Nadu| India| 2023-04-21| false| Customer_6 Singh|
| 7| Customer_7| Mumbai| Telangana| India| 2023-05-23| true| Customer_7 Singh|
| 8| Customer_8| Pune| Tamil Nadu| India| 2023-07-17| true| Customer_8 Singh|
| 9| Customer_9| Delhi| Karnataka| India| 2023-06-02| true| Customer_9 Singh|
| 10|Customer_10|Hyderabad| Delhi| India| 2023-02-23| true|Customer_10 Singh|
| 11|Customer_11| Delhi|West Bengal| India| 2023-11-08| true|Customer_11 Singh|
| 12|Customer_12| Delhi| Delhi| India| 2023-06-27| false|Customer_12 Singh|
| 13|Customer_13| Pune|Maharashtra| India| 2023-02-03| true|Customer_13 Singh|
| 14|Customer_14| Chennai| Karnataka| India| 2023-04-06| true|Customer_14 Singh|
| 15|Customer_15|Hyderabad|West Bengal| India| 2023-03-31| true|Customer_15 Singh|
| 16|Customer_16| Chennai|Maharashtra| India| 2023-04-26| true|Customer_16 Singh|
| 17|Customer_17| Pune| Delhi| India| 2023-04-14| false|Customer_17 Singh|
| 18|Customer_18| Chennai|Maharashtra| India| 2023-02-04| false|Customer_18 Singh|
| 19|Customer_19| Chennai| Karnataka| India| 2023-01-22| true|Customer_19 Singh|
+-----------+-----------+---------+-----------+-------+-----------------+---------+-----------------+
only showing top 20 rows 

6. Dropping a Column
The drop method is used to remove one or more columns from a DataFrame.

# Dropping a single column


df_dropped_column = df_renamed_column.drop("Full Name")

# Display the DataFrame


df_dropped_column.show()

  df_dropped_column: pyspark.sql.dataframe.DataFrame = [customer_id: integer, name: string ... 5 more fields]


| 2| Customer_2| Delhi|Maharashtra| India| 2023-08-05| true|
| 3| Customer_3| Mumbai| Telangana| India| 2023-06-04| true| 
| 4| Customer_4| Delhi| Karnataka| India| 2023-03-15| false|
| 5| Customer_5| Kolkata|West Bengal| India| 2023-08-19| true|
| 6| Customer_6| Kolkata| Tamil Nadu| India| 2023-04-21| false|
| 7| Customer_7| Mumbai| Telangana| India| 2023-05-23| true|
| 8| Customer_8| Pune| Tamil Nadu| India| 2023-07-17| true|
| 9| Customer_9| Delhi| Karnataka| India| 2023-06-02| true|
| 10|Customer_10|Hyderabad| Delhi| India| 2023-02-23| true|
| 11|Customer_11| Delhi|West Bengal| India| 2023-11-08| true|

file:///C:/Users/esidhannara/Downloads/Dataframe Basic Operations (1).html 9/11


6/11/25, 5:54 PM Dataframe Basic Operations - Databricks

| 17|Customer_17| Pune| Delhi| India| 2023 04 14| false|


| 18|Customer_18| Chennai|Maharashtra| India| 2023-02-04| false| 
| 19|Customer_19| Chennai| Karnataka| India| 2023-01-22| true|
+-----------+-----------+---------+-----------+-------+-----------------+---------+
only showing top 20 rows

Dropping Multiple Columns

%python
# Dropping multiple columns
df_dropped_columns = df_renamed_column.drop("name", "country")

# Display the DataFrame


df_dropped_columns.show()


  df_dropped_columns: pyspark.sql.dataframe.DataFrame = [customer_id: integer, city: string ... 4 more fields]
| 2| Delhi|Maharashtra| 2023-08-05| true| Customer_2 Singh|

| 3| Mumbai| Telangana| 2023-06-04| true| Customer_3 Singh|
| 4| Delhi| Karnataka| 2023-03-15| false| Customer_4 Singh|
| 5| Kolkata|West Bengal| 2023-08-19| true| Customer_5 Singh|
| 6| Kolkata| Tamil Nadu| 2023-04-21| false| Customer_6 Singh|
| 7| Mumbai| Telangana| 2023-05-23| true| Customer_7 Singh|
| 8| Pune| Tamil Nadu| 2023-07-17| true| Customer_8 Singh|
| 9| Delhi| Karnataka| 2023-06-02| true| Customer_9 Singh|
| 10|Hyderabad| Delhi| 2023-02-23| true|Customer_10 Singh|
| 11| Delhi|West Bengal| 2023-11-08| true|Customer_11 Singh|
| 12| Delhi| Delhi| 2023-06-27| false|Customer_12 Singh|
| 13| Pune|Maharashtra| 2023-02-03| true|Customer_13 Singh|
| 14| Chennai| Karnataka| 2023-04-06| true|Customer_14 Singh|
| 15|Hyderabad|West Bengal| 2023-03-31| true|Customer_15 Singh|
| 16| Chennai|Maharashtra| 2023-04-26| true|Customer_16 Singh|
| 17| Pune| Delhi| 2023-04-14| false|Customer_17 Singh|
| 18| Chennai|Maharashtra| 2023-02-04| false|Customer_18 Singh|
| 19| Chennai| Karnataka| 2023-01-22| true|Customer_19 Singh|
+-----------+---------+-----------+-----------------+---------+-----------------+
only showing top 20 rows 

7. Removing Duplicate Rows

%python
# Removing duplicate rows
df_distinct = df_renamed_column.distinct()

# Display the DataFrame


df_distinct.show()

  df_distinct: pyspark.sql.dataframe.DataFrame = [customer_id: integer, name: string ... 6 more fields]


| 5| Customer_5| Kolkata|West Bengal| India| 2023-08-19| true| Customer_5 Singh|

| 6| Customer_6| Kolkata| Tamil Nadu| India| 2023-04-21| false| Customer_6 Singh|

file:///C:/Users/esidhannara/Downloads/Dataframe Basic Operations (1).html 10/11


6/11/25, 5:54 PM Dataframe Basic Operations - Databricks

| 3| Customer_3| Mumbai| Telangana| India| 2023-06-04| true| Customer_3 Singh|


| 16|Customer_16| Chennai|Maharashtra| India| 2023-04-26| true|Customer_16 Singh| 
| 12|Customer_12| Delhi| Delhi| India| 2023-06-27| false|Customer_12 Singh|
| 20|Customer_20| Pune| Karnataka| India| 2023-02-19| false|Customer_20 Singh|
| 11|Customer_11| Delhi|West Bengal| India| 2023-11-08| true|Customer_11 Singh|
| 4| Customer_4| Delhi| Karnataka| India| 2023-03-15| false| Customer_4 Singh|
| 19|Customer_19| Chennai| Karnataka| India| 2023-01-22| true|Customer_19 Singh|
| 7| Customer_7| Mumbai| Telangana| India| 2023-05-23| true| Customer_7 Singh|
| 14|Customer_14| Chennai| Karnataka| India| 2023-04-06| true|Customer_14 Singh|
| 1| Customer_1| Pune|West Bengal| India| 2023-08-10| true| Customer_1 Singh|
| 13|Customer_13| Pune|Maharashtra| India| 2023-02-03| true|Customer_13 Singh|
+-----------+-----------+---------+-----------+-------+-----------------+---------+-----------------+
only showing top 20 rows

Aggregation
Will cover in detail tomorrow

+---------+------+
| city| count|
+---------+------+
|Bangalore|661013|
| Chennai|660249|
| Mumbai|661241|
|Ahmedabad|660218|
| Kolkata|660174|
| Pune|660737|
| Delhi|661025|
|Hyderabad|662281|
+---------+------+

file:///C:/Users/esidhannara/Downloads/Dataframe Basic Operations (1).html 11/11

You might also like