Training jobs & Monitoring
This section goes step by step into generic instructions for launching and monitoring training jobs.
Launching a training job
First you have to setup your training job and verified your data are correctly setup using the DD platform custom Jupyter tooling.
To launch a training job, use the Run training button:

Training job setup
Every training job appears into the UI as a badge on the ‘Training’ page

For some training jobs, an internal setup, e.g. pre-processing the full dataset, can take a few minutes. In that case, the metrics may not appear immediately, and the badge may look like this for a moment:

Training job monitoring
Training jobs can last from minutes to several days. For this reason the DD platform yields a few tools for monitoring the running jobs:
- ‘Training’ section of the UI reports on all the currently training jobs:

- The ‘Monitor’ button on the training badge yields metrics and details on the run:

- The DD platform custom Jupyter notebook screens the progression, status and remaining time of the training job:

- The DD platform custom Jupyter notebook allows fine-grained monitoring of the job calls, via the
Logstab:

- For even fine-grained information, the
Widget logstab can be refreshed when needed. It captures all calls between Jupyter and the Deep Learning server:

Stopping a training job
To stopping a training job, use the ‘Delete Service’ button:
