Databricks Running Notes
Databricks Running Notes
{"mobile": "+1 234 567 8901", "home": "+1 234 567 8911"},
array:list
"phone_numbers": ["+1 234 567 8901", "+1 234 567 8911"],
---------------------------------------------------------------------------------
--------------------------------------
---------------------------------------------------------------------------------
----------------------------------
for select we need to import col,concat,lit
from pyspark.sql.functions import concat, lit, col
users_df. \
select(
'id', 'first_name', 'last_name',
concat(col('first_name'), lit(', '),
col('last_name')).alias('full_name')
). \
show()
but for select exprerssion is used to select like sql syntax it
check here we dont use alias function instead we use as
keyword same as sql
spark.sql("""
SELECT id, first_name, last_name,
concat(first_name, ', ', last_name) AS full_name
FROM users
"""). \
show()
---------------------------------------------------------------------------------
-----------------------------------
06 Referring Columns using Spark Data Frame Names
1)
users_df['id']
passing as list
this will return a column type object
Out[7]: Column<'id'>
2)
We can import col and we can also use col
From pyspark.sql.function import col
col('id')
Out[10]: Column<'id'>
3)
You can also check the type of object by specify type
type(users_df['id'])
pyspark.sql.column.Column
4)
from pyspark.sql.functions import col
users_df.select('id', col('first_name'), 'last_name').show()
you should specify col and single quotes
5)
You can also specify dataframe name in select for column
take it in list
users_df.select(users_df['id'], col('first_name'),
'last_name').show()
users_df.select(users_df['id','email'], col('first_name'),
'last_name').show()