Azure Databricks
Azure Databricks
First, ensure that your Databricks environment has access to Azure Blob
Storage. You’ll need the following:
dbutils.fs.mount(
source="wasbs://[email protected]
.net",
mount_point="/mnt/trans",
extra_configs={"fs.azure.account.key.samdatabricks.blob.core.window
s.net":"5kxx17N7TGNiXAkD7qDXQCGiEmumpM7yC+tS6/5Er7SK7Nio1fGHFUfP+go
MQ4cy+dcnFlGu6cAN+AStqpNqKA=="})
df = spark.read.format("csv") \
.option("header", "true") \
.load("dbfs:/mnt/trans/covid_vaccine_statewise.csv")
4. Display the Data
from pyspark.sql.functions import col, lit
from pyspark.sql.types import DateType, DoubleType, IntegerType,
StringType
display(df)
5. Perform the Main Transformation Required
from pyspark.sql.functions import max
display(transformed_df)
7 .Load Data Back to Azure Blob Storage
dbutils.fs.mount(source="wasbs://[email protected]
ore.windows.net",
mount_point="/mnt/transaction2",
extra_configs={"fs.azure.account.key.samdatabricks.blob.core
.windows.net":"5kxx17N7TGNiXAkD7qDXQCGiEmumpM7yC+tS6/5Er7SK7Nio1fG
HFUfP+goMQ4cy+dcnFlGu6cAN+AStqpNqKA=="})
output_path = "/mnt/transaction2/"
df2file(transformed_df,output_path,'df_statewise_doses_administrat
ed')
9. Storing the Data Directly to the AZure SQL Database Using JDBC
Connector
# Step 1: Define JDBC connection details
jdbcHostname = "sam-sql-server123.database.windows.net"
jdbcPort = 1433
jdbcDatabase = "databricksdatabase"
jdbcUrl =
f"jdbc:sqlserver://{jdbcHostname}:{jdbcPort};database={jdbcDatabas
e}"
connectionProperties = {
"user": "samadhan",
"password": "@Sam1Sam",
"driver":
"com.microsoft.sqlserver.jdbc.SQLServerDriver"
}
4. Add Copy Data Activity to Load Transformed Data Back to Blob Storage
5. Add Copy Data Activity to Load Transformed Data into Azure SQL Database
1. Add Another Copy Data Activity to load data into SQL Database.
2. Configure the Source as Databricks DBFS or Blob Storage:
○ If Databricks, use the DBFS path for the transformed data.
○ If Blob Storage, use the output container path for transformed data.
3. Configure the Sink as Azure SQL Database:
○ Choose the Azure SQL Database linked service.
○ Specify the table name in SQL Database where the transformed data
should be stored.