Azure Databricks Monitoring

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 22

Contents

Change Log...................................................................................................................................................2
Azure Databricks Monitoring........................................................................................................................2
Monitoring of Spark Metrics with Azure Databricks Monitoring Library..................................................2
Prerequisites:.......................................................................................................................................2
Configuration........................................................................................................................................2
Logging custom metrics......................................................................................................................10
Monitoring of Linux virtual machines provisioned for Databricks clusters.............................................12
Configuration:.....................................................................................................................................12
Monitoring of user activities in Databricks Workspace UI......................................................................17
Prerequisites......................................................................................................................................17
Configuration......................................................................................................................................17
Diagnostic log schema........................................................................................................................19
Browsing diagnostic logs in Azure Monitor........................................................................................20
Limitations..........................................................................................................................................22

1
Change Log
Change Type Author Date
Document creation Łukasz Olejniczak 28.05.2019
([email protected])

Azure Databricks Monitoring


Databricks monitoring can be broken down into three categories:

 Monitoring of Linux virtual machines provisioned for Databricks clusters


 Monitoring of Spark metrics with Azure Databricks Monitoring Library.
 Monitoring of user activities in Databricks Workspace UI

All categories log into Log Analytics Workspace. Azure Databricks Monitoring Library comes with ARM
template to create Log Analytics Workspace together with queries which help to get insights from raw
logs.

Monitoring of Spark Metrics with Azure Databricks Monitoring Library


The mechanism recommended by Databricks is based on Azure Databricks Monitoring Library. The
library does not come by default with Databricks Runtime. Instead it needs to be built from sources
available on GitHub (https://github.com/mspnp/spark-monitoring).

Prerequisites
The following components need to be installed in order to build Azure Databricks Monitoring Library
from sources:

 Java Devlopment Kit (JDK) version 1.8


 Scala language SDK 2.11
 Apache Maven 3.5.4
 Azure CLI
 Python 3 + PIP
 GIT client

Configuration
1. Use GIT client to import Azure Databricks Monitoring Library sources into your local machine.

2
The GitHub repo for the Azure Databricks Monitoring Library has the following directory structure:

/perftools
/src
/spark-jobs
/spark-listeners-loganalytics
/spark-listeners
/pom.xml

The perfools directory includes templates to provision Log Analytics Workspace

The src/spark-jobs directory is a sample Spark application demonstrating how to implement a Spark
application metric counter.

The src/spark-listeners directory includes functionality that enables Azure Databricks to send Apache
Spark events at the service level to an Azure Log Analytics workspace. Azure Databricks is a service based
on Apache Spark, which includes a set of structured APIs for batch processing data using Datasets,
DataFrames, and SQL. With Apache Spark 2.0, support was added for Structured Streaming, a data
stream processing API built on Spark's batch processing APIs.

The src/spark-listeners-loganalytics directory includes a sink for Spark listeners with client for an Azure
Log Analytics Workspace. This directory also includes a log4j Appender for your Apache Spark application
logs.

The spark-listeners-loganalytics and spark-listeners directories contain the code for building the two JAR
files that are deployed to the Databricks cluster. The spark-listeners directory includes a scripts directory
that contains a cluster node initialization script to copy the JAR files from a staging directory in the Azure
Databricks file system to execution nodes.

The pom.xml file is the main Maven build file for the entire project.

3
2. Go to src directory and execute the following command to start build process:
cd src
mvn clean install

The expected outcome is successful compilation of all projects:

so that every project should have corresponding jar file in its target folder.

Project Jar
spark-jobs spark-jobs/target/spark-jobs-1.0-SNAPHOST.jar
spark-listeners spark-listeners/target/spark-listeners-1.0-SNAPHOST.jar
spark-listeners-loganalytics Spark-listeners/target/spark-listeners-loganalytics-1.0-SNAPSHOT.jar

3. To provision Log Analytics Workspace navigate to /perftools/deployment/loganalytics/ directory


4. Deploy logAnalyticsDeploy.json Azure Resource Manager template:

az group deployment create --resource-group <resource-group-name>


--template-file logAnalyticsDeploy.json --parameters location='East
US' serviceTier='Standalone'

4
5. Log into Azure Portal and open the created Azure Log Analytics Workspace resource to get the
corresponding WORKSPACE ID and PRIMARY KEY:

6. Open the /src/spark-listeners/scripts/listeners.sh script file and add your Log Analytics


Workspace ID and Key to the lines below:
export LOG_ANALYTICS_WORKSPACE_ID=
export LOG_ANALYTICS_WORKSPACE_KEY=

7. Install Databricks CLI to communicate with DBFS

pip install --upgrade databricks-cli

8. Set up authentication details for Databricks (access token is required). Credentials will be stored
in ~/.databrickscfg.

databricks configure

5
databricks configure --token

9. Verify if you can browse DBFS

10. Use the Azure Databricks CLI to create a directory named dbfs:/databricks/monitoring-staging

11. Use the Azure Databricks CLI to copy /src/spark-listeners/scripts/listeners.sh


(with provided Azure Log Analytics Workspace credentials) to dbfs:/databricks/monitoring-
staging

12. Use the Azure Databricks CLI to copy /src/spark-listeners/scripts/metrics.properties 


to dbfs:/databricks/monitoring-staging

6
13. Use the Azure Databricks CLI to copy Azure Databricks Monitoring Library jars
spark-listeners-1.0-SNAPSHOT.jar and spark-listeners-loganalytics-1.0-SNAPSHOT.jar to
dbfs:/databricks/monitoring-staging

14. Create cluster from Databricks Workspace UI. Under advanced options select „Init scripts”.
Under destination select DBFS and enter: dbfs:/databricks/monitoring-staging/listeners.sh

15. After you complete these steps, your Databricks cluster streams some metric data about the
cluster itself to Azure Monitor. This log data is available in your Azure Log Analytics workspace
under the "Active | Custom Logs | SparkMetric_CL" schema.

7
To get the list of available Spark metrics the following query can be used:

Some of them are:

HiveExternalCatalog.parallelListingJobCount
HiveExternalCatalog.partitionsFetched
CodeGenerator.compilationTime
CodeGenerator.generatedClassSize
CodeGenerator.generatedMethodSize
CodeGenerator.sourceCodeSize
shuffleService.blockTransferRateBytes
shuffleService.openBlockRequestLatencyMillis
shuffleService.registerExecutorRequestLatencyMillis
shuffleService.registeredExecutorsSize
shuffleService.shuffle-server.usedDirectMemory
shuffleService.shuffle-server.usedHeapMemory
Databricks.directoryCommit.autoVacuumCount
CodeGenerator.generatedMethodSize
shuffleService.registeredExecutorsSize
shuffleService.shuffle-server.usedDirectMemory
shuffleService.shuffle-server.usedHeapMemory
Databricks.directoryCommit.autoVacuumCount
Databricks.directoryCommit.deletedFilesFiltered
Databricks.directoryCommit.filterListingCount
Databricks.directoryCommit.jobCommitCompleted
Databricks.directoryCommit.markerReadErrors
Databricks.directoryCommit.markerRefreshCount

8
Databricks.directoryCommit.markerRefreshErrors
Databricks.directoryCommit.markersRead
Databricks.directoryCommit.repeatedListCount
Databricks.directoryCommit.uncommittedFilesFiltered
Databricks.directoryCommit.untrackedFilesFound
Databricks.directoryCommit.vacuumCount
Databricks.directoryCommit.vacuumErrors
Databricks.preemption.numChecks
Databricks.preemption.numPoolsAutoExpired
Databricks.preemption.numTasksPreempted
Databricks.preemption.poolStarvationMillis
Databricks.preemption.schedulerOverheadNanos
Databricks.preemption.taskTimeWastedMillis
HiveExternalCatalog.fileCacheHits
HiveExternalCatalog.filesDiscovered
HiveExternalCatalog.hiveClientCalls

Azure Log Analytics Workspace deployed from ARM template available in Azure Databricks Monitoring
Library sources includes a set of predefined queries:

Query Description
%cpu time per executor
% deserialize time per executor
% jvm time per executor
% serialize time per executor
Disk Bytes Spilled

9
Error traces
File system bytes read per executor
File system bytes write per executor
Job errors per job
Job latency per job
Job Throughput
Running Executors
Shuffle Bytes Read
Shuffle Bytes read per executor
Shuffle bytes read to disk per executor
Shuffle client direct memory
Shuffle client direct memory per executor
Shuffle disk bytes spilled to disk per executor
Shuffle heap memory per executor
Shuffle memory spilled per executor
Stage latency per stage
Stage throughput per stage
Streaming errors per stream
Streaming latency per stream
Streaming throughput inputrowssec
Streaming throughput processedrowssec
Sum Task Execution Per Host
Task Deserialization Time
Task errors per Stage
Task Executor Compute time
Task Input Bytes read
Task Latency per Stage
Task result serialization Time
Task Scheduler Delay Latency
Task Shuffle Bytes Read
Task Shuffle Bytes Written
Task Shuffle Read Time
Task Shuffle Write time
Task throughput
Tasks per executor
Tasks per stage

Logging custom metrics


Azure Databricks Monitoring Library can be used to log custom events, e.g. exceptions captured when
connecting to data sources.

In order to log custom metrics the following needs to be added to the application code:

1. Import org.apache.spark.metrics.UserMetricsSystems class

10
2. Register custom metric (e.g. as counter – you can declare gauge, histogram, meter, timer. For
more details on how to register and use distinct types of custom metrics check:
https://github.com/groupon/spark-
metrics/blob/master/src/main/scala/org/apache/spark/groupon/metrics/example/MetricsBenc
hmarkApp.scala)

3. You can browse Azure Log Analytics Workspace used for Spark Metrics to find custom events:

11
Monitoring of Linux virtual machines provisioned for Databricks clusters
To monitor Databricks Cluster VM instances the recommended approach is to configure Log Analytics
Agent. The agent for Linux communicates outbound to the Azure Monitor service over TCP port 443

Configuration:
1. Copy the following script to a new file on local machine (replace WORKSPACE_ID and
WORKSPACE_PRIMARY_KEY placeholders with values of Azure Log Analytics Workspace created
for Databricks Monitoring). The script will downloads the agent, validate its checksum, install
it and finally start it.

sed -i "s/^exit 101$/exit 0/" /usr/sbin/policy-rc.d && wget


https://raw.githubusercontent.com/Microsoft/OMS-Agent-for-
Linux/master/installer/scripts/onboard_agent.sh && sh
onboard_agent.sh -w <WORKSPACE_ID> -s <WORKSPACE_PRIMARY_KEY> -d
opinsights.azure.com && sudo
/opt/microsoft/omsagent/bin/service_control restart <WORKSPACE_ID>

2. Save the file as logAnalyticsAgentDeployment.sh


3. Log in to Azure Portal and open Azure Log Analytics Workspace created for Databricks
Monitoring. Select Advances Settings on sidebar.

4. Select Data options and the select Syslog tab. Type syslog in search panel and specify which level
of information should be captured:

12
5. Click Save and go to Linux Performance Counter Tab:

Type * in search panel to check available metrics and select those that are needed:

13
Metric category Metric name

14
Logical Disk % Free Inodes
Logical Disk % Free Space
Logical Disk % Used Inodes
Logical Disk % Used Space
Logical Disk Disk Read Bytes/sec
Logical Disk Disk Reads/sec
Logical Disk Disk Transfers/sec
Logical Disk Disk Write Bytes/sec
Logical Disk Disk Writes/sec
Logical Disk Free Megabytes
Logical Disk Logical Disk Bytes/sec
Memory % Available Memory
Memory % Available Swap Space
Memory % Used Memory
Memory % Used Swap Space
Memory Available MBytes Memory
Memory Available MBytes Swap
Memory Page Reads/sec
Memory Page Writes/sec
Memory Pages/sec
Memory Used MBytes Swap Space
Memory Used Memory MBytes
Network Total Bytes Transmitted
Network Total Bytes Received
Network Total Bytes
Network Total Packets Transmitted
Network Total Packets Received
Network Total Rx Errors
Network Total Tx Errors
Network Total Collisions
Physical Disk Avg. Disk sec/Read
Physical Disk Avg. Disk sec/Transfer
Physical Disk Avg. Disk sec/Write
Physical Disk Physical Disk Bytes/sec
Process Pct Privileged Time
Process Pct User Time
Process Used Memory kBytes
Process Virtual Shared Memory
Processor % DPC Time
Processor % Idle Time
Processor % Interrupt Time
Processor % IO Wait Time

15
Processor % Nice Time
Processor % Privileged Time

It is possible to specify sampling interval for every category:

6. Save changes.

7. Use Databricks CLI to copy logAnalyticsAgentDeployment.sh to dbfs:/databricks/init/


dbfs:/databricks/init/ is predefined path which is checked by all clusters at startup for
initialization scripts. It means that all clusters will execute this script at startup.
8. Create new Databricks cluster. Log in to Azure portal and open Azure Log Analytics Workspace
for Databricks Monitoring. Open Advanced Settings on sidebar. Select Connected Sources Tab
and then Linux Servers Tab. This information about successfully connected VMs is displayed:

9. Open Azure Monitor, select Search Log option and select Log Analytics Workspace used for
Databricks Monitoring.
10. Query Perf object:

16
Monitoring of user activities in Databricks Workspace UI
Databricks provides diagnostic logs of activities performed by Azure Databricks users.

Prerequisites
Diagnostic logs require Azure Databricks Premium Plan.

Configuration
The following steps are necessary to enable diagnostic logs delivery:

1. Log in to Azure Portal as an Owner or Contributor for Azure Databricks workspace


2. List your Azure Databricks Service resources
3. Click Azure Databricks Service resource for which you want to enable diagnostics delivery
4. In the Monitoring section of the sidebar, click Diagnostics settings

5. In Diagnostic settings click Add diagnostic setting

17
6. Provide the following configuration:
 Select where diagnostic logs should be delivered. There are three options available:
Archive to a storage account
Stream to an event hub
Send to Log Analytics
It is possible to select all three options.
 Choose which components should be monitored. The following components are
available:
Dbfs
Clusters
Accounts
Jobs
Notebook
SSH
Workspace
Secrets
sqlPermissions
It is possible to select all components

18
7. Click save
8. Once logging is enabled for your account, Azure Databricks automatically starts sending
diagnostic logs in to your delivery location on a periodic basis. Logs are available within 24 to
72 hours of activation. On any given day, Azure Databricks delivers at least 99% of diagnostic
logs within the first 24 hours, and the remaining 1% in no more than 72 hours.

Diagnostic log schema


Every record in diagnostic log contains the following information:

Field Description
operationversion The schema version of the diagnostic log format
time UTC timestamp of the action
properties.sourceIPAddress The IP address from where request was sent
properties.userAgent The browser or API client used to make the
request
properties.sessionId Session ID of the action
identities Information about the user that makes the

19
requests
category The service that logged the request
operationName The action, such as login, logout, read, write, etc.
properties.requestId Unique request ID. If action take a long time, the
request and response are logged separately, but
the request and response pair have the same
properties.requestId
properties.requestParams Parameter key-value pairs used in the event
properties.response Response to the request
Properties.response.errorMessage The error message if there was an error
Properties.response.result The result of the request
Properties.response.statusCode HTTP status code that indicates the request
succeeds or not
properties.logId The unique identifier for the log messages

Browsing diagnostic logs in Azure Monitor


When delivery to Azure Log Analytics is configured, users can browse diagnostic logs in Azure Monitor.

1. Open Azure Monitor


2. Click Search Logs

3. Expand LogManagement group in sidebar. You should see the following groups:
DatabricksAccounts
DatabricksClusters
DatabricksDBFS
DatabricksJobs
DatabricksNotebook
DatabricksSQLPermissions
DatabricksSSH
DatabricksSecrets

20
DatabricksTables
DatabricksWorkspace

For example: The following query will list all events related to Clusters component which were
triggered within defined period of time:

21
Limitations
Because diagnostic logs are delivered not immediately when event is triggered but on periodic basis so
that they are available within 24 to 72 hours, they should not be used for alerting. Instead they are a
great source of information for reporting.

22

You might also like