Lecture 9
Lecture 9
LECTURE 9
Managing AWS IoT Devices at Scale
AWS IoT Scenario
One of the biggest challenges in the production process is maintaining the cold
chain. Over the past few months, their business has experienced 20 percent
spoilage while the ice cream is moving through the manufacturing plant.
The ice cream company conducted a proof of concept (POC) to see if AWS IoT
services can help them with their cold chain problems. The POC was a great
success, and now they want to implement an AWS IoT solution on their
manufacturing plant and fleet of 30 delivery trucks. Part of the solution will be
to upgrade their entire dairy processing plant with internet-capable devices.
Scenario Requirements
•How to onboard numerous devices at (nearly) the same time, with the minimum number of
steps
•How to search devices by function, manufacturer, installation date, and technical specifications
•How to monitor the activity, performance, and history of each device
•How to perform bulk actions on devices, such as activating, deactivating, and updating
Recommended Solution
Recommended Solution
Recommended Solution
Recommended Solution
Recommended Solution
AWS IoT Device
Management
AWS IoT Device Management
The ice cream business has been great, and the company is expanding at a rapid
rate. They will be enlarging their current plant, and building a new one in a
neighboring town. Their truck fleet will nearly triple.
You can define the level of granularity to monitor devices and, if needed, focus your monitoring
services to be device specific
Organize
The dairy processing plant and truck fleet will have hundreds of devices,
including sensors, actuators, microcontrollers, and others. They should be
organized so that they are easy to locate, access, and monitor.
For example, the dairy's engineers might want to search for all the sensors
(temperature, volume, and flow rate) in the milk receiving and raw milk
storage area to check status.
In another use case, they might want to perform updates on all temperature
sensors, regardless of their location in the plant. There are several ways to
accomplish this, which will be covered later in this course.
Update
Finally, you can organize and schedule device updates utilizing AWS IoT Jobs, a
feature of AWS IoT Device Management. Because of bandwidth and downtime,
you might not be able to update all of your devices at the same time. With AWS
IoT Jobs, you ensure that these updates roll out reliably and in a controlled
manner. This kind of automation is essential in large-scale systems.
Fleet Creation
Provisioning Devices in AWS IoT
Thing
special-purpose
just-in-time- just-in-time- provisioning by credentials
provisioning (JITP) registration (JITR) trusted user (embedded in
devices)
AWS IoT Provisioning Options
Provisioning
Provisioning Provisioning
in advance on demand
Parameters Resource
Single thing provisioning
To provision a thing, use the register-thing CLI command. The register-thing CLI command takes
the following arguments:
--template-body
The provisioning template
--parameters
A list of name-value pairs for the parameters used in the provisioning template, in JSON
format (for example, {"ThingName" : "MyProvisionedThing", "CSR" : "csr-text"})
Bulk registration
You can use the start-thing-registration-task command to register things in
bulk. This command takes a provisioning template, an S3 bucket name, a key
name, and a role ARN that permits access to the file in the S3 bucket. The file in
the S3 bucket contains the values used to replace the parameters in the
template. The file must be a newline-delimited JSON file. Each line contains all
the parameter values for registering a single device. The following is an
example:
JITP and JITR Similarities
•They both use unique X.509 certificates that are loaded on the devices, usually
by the manufacturer.
2
•They both require that the customer Certificate Authority (CA) that signed the
device certificate is registered with AWS IoT Core.
•When they connect for the first time, they both present their client certificate to
AWS IoT Core.
One of the biggest reasons a company might choose JITR over JITP is flexibility
and the need to interact with external systems during the provisioning process.
Just-in-time provisioning (JITP)
JITP is a good choice when deploying numerous devices that are similar in their functions and
attributes.
JITP can be used when the customer trusts the entity who will be loading the client certificate
onto the devices.
When using JITP, auto registration must be turned on.
JITP uses a provisioning template that contains the parameters that will be used to activate the
certificate and create the IoT resources.
It creates an IoT policy and an IoT thing and registers it in the registry. After these things are
complete, the device can connect to IoT core.
Just-in-time provisioning (JITP)
Just-in-Time Registration (JITR)
JITR is a good choice when deploying devices that require more configuration options.
This added flexibility comes with a higher financial cost.
JITR does not use a provisioning template.
Instead of a provisioning template, the process takes the following listed steps
But they are also generating a lot of information about their own activities. This
information goes into logs, which are essential for troubleshooting,
maintenance, and device lifecycle management.
AWS IoT log Activation levels
•ERROR
Any error that causes an operation to fail. These logs only include ERROR
information.
•WARN
Anything that can potentially cause inconsistencies in the system but might not
cause the operation to fail. These logs include ERROR and WARN information.
•INFO
High-level information about the flow of things. These logs include INFO, ERROR,
and WARN information.
•DEBUG
Information that might be helpful when debugging a problem. These logs include
DEBUG, INFO, ERROR, and WARN information.
•DISABLED
All logging is disabled.
View AWS IoT logs in CloudWatch
AWS IoT pushes its logging information to CloudWatch for monitoring and
analysis.
Before turning on logging, you must first create an IAM role and a policy that
gives AWS permission to monitor AWS IoT activity.
AWS IoT log entries
Each component of AWS IoT generates its own Each of those component logs has several
log entries: specific event types.
•Message broker log entries For example, the Message broker logs the
following event types:
•Device Shadow log entries
•Connect log entry
•Rules engine log entries
•Disconnect log entry
•Job log entries
•GetRetainedMessage log entry
•Device provisioning log entries
•ListRetainedMessage log entry
•Dynamic thing group log entries
•Publish-In log entry
•Fleet indexing log entries
•Publish-Out log entry
•Common CloudWatch Logs attributes
•Subscribe log entry
Monitoring AWS IoT
It is important to collect monitoring data from all parts of your AWS solution to make it easier to
debug a multipoint failure, if one occurs.
There are four main stages to building a monitoring scheme:
1.Determine what, why, and when you will monitor.
2.Activate logging and view in CloudWatch
3.Configure CloudWatch alarms to alert you when the value of the metric exceeds a given
threshold over a number of time periods.
4.Establish baseline performance conditions and save logs and metrics as historical data.
Back to the scenario
In our ice cream manufacturing plant scenario, let's pretend they start by
monitoring their raw milk storage tanks.
Their first step would be to determine which tanks those are and when they
want to monitor the temperature throughout the day.
Second, they would need to activate the CloudWatch logging for each of those
devices.
Third, they would need to configure their CloudWatch alarms to send alerts
when the temperature is outside their desired parameters.
And, finally, they would need to establish baseline cooling temperatures of the
storage tanks and save logs and metrics to be used as historical data.
1. Determine what, why, and when you
will monitor
•What are your monitoring goals?
•Which resources will you monitor?
•How often will you monitor these resources?
•Which monitoring tools will you use?
•Who will perform the monitoring tasks?
•Who should be notified when something goes wrong?
3. Configure CloudWatch to collect
alarms and metrics from AWS IoT
CloudWatch collects and processes raw data from AWS IoT into readable, near real-time
metrics. Metrics data is stored for two weeks to give you a historical view. By default, AWS IoT
metric data is sent automatically to CloudWatch in 1 minute intervals.
4. Establish baseline performance for
comparison using metrics
Logs are useful for troubleshooting, and alarms help give you visibility across
your monitored fleet.
Job document
◦ A description of the remote operations to be performed by the devices.
◦ The job document describes the commands that need to run and the location of resources and
arguments to be passed to the commands.
◦ Job documents are UTF-8 encoded JSON documents and contain information that your devices need to
perform a job. A job document can contain one or more URLs where the device can download an
update or other data. The job document can be stored in an S3 bucket or be included inline with the
command that creates the job.
Jobs key concepts
Target
◦ a list of targets that are the devices that should perform the operations. The targets can be things, thing
groups, or both. The AWS IoT Jobs service sends a message to each target to inform it that a job is available.
Deployment
◦ After you create a job by providing the job document and specifying your list of targets, the job document is
then deployed to the remote target devices for which you want to perform the update. For snapshot jobs, the
job will complete after deploying to the target devices. For continuous jobs, a job is deployed to a group of
devices as they are added to the groups.
Job execution
◦ A job execution is an instance of a job on a target device. The target starts an execution of a job by
downloading the job document. It then performs the operations specified in the document and reports its
progress to AWS IoT. An execution number is a unique identifier of a job execution on a specific target. The
AWS IoT Jobs service provides APIs to track the progress of a job execution on a target and the progress of a
job across all targets.
Job types concepts
Snapshot job
◦ A job that is complete when all things specified as targets have reported the job as completed. By
default, a job is sent to all targets that you specify when you create the job. After those targets
complete the job (or report that they're unable to do so), the job is complete.
Continuous job
◦ A continuous job is sent to all targets that you specify when you create the job. It is also sent to any new
devices (things) that are added to the target group. For example, a continuous job can be used to
onboard or upgrade devices as they're added to a group. You can make a job continuous by setting an
optional parameter when you create the job.
◦ Note: When targeting your IoT fleet using dynamic thing groups, AWS recommends that you use
continuous jobs instead of snapshot jobs. By using continuous jobs, devices that join the group receive
the job execution even after the job has been created.
Job targets and execution
Structure of job documents
Managing Jobs (Job templates)
Use job templates to preconfigure jobs that you can deploy to multiple sets of target devices. For
frequently performed remote actions that you want to deploy to your devices, such as rebooting
or installing an application, you can use templates to define standard configurations. When you
want to perform operations such as deploying security patches and bug fixes, you can also
create templates from existing jobs.
When creating a job template, you can specify additional configurations.
Managing Jobs (Job configurations)
Rollout
◦ This configuration defines how many devices receive the job document every minute.
Scheduling
◦ If you want to schedule a job for a future date and time, use this configuration to schedule your job.
Abort
◦ Use this configuration to cancel a job in cases such as when some devices don't receive the job notification or
your devices report failure for their job executions.
Timeout
◦ If there isn't a response from your job targets within a certain duration after their job executions have started,
the job can fail.
Retry
◦ If your device reports failure when attempting to complete a job execution or if your job execution times out,
you can use this configuration to retry the job execution for your device.
Jobs Notifications (Jobs events)
The AWS IoT Jobs service publishes to reserved topics on the MQTT protocol when jobs are
pending, completed, or canceled and when a device reports success or failure when running a
job. Devices or management and monitoring applications can track the status of jobs by
subscribing to these topics.
How to activate jobs events
Response messages from the AWS IoT Jobs service don't pass through the message broker, and
they can't be subscribed to by other clients or rules. To subscribe to job activity-
related messages, use the notify and notify-next topics.
Job event types
Let's take a look at the different types of job events. The AWS IoT Jobs service
publishes a message on an MQTT topic when a job is completed, canceled,
deleted, or when cancellation or deletion are in progress. Those messages look
like this:
• $aws/events/job/jobID/completed
• $aws/events/job/jobID/canceled
• $aws/events/job/jobID/deleted
• $aws/events/job/jobID/cancellation_in_progress
• $aws/events/job/jobID/deletion_in_progress
Interacting with AWS IoT Jobs
through MQTT
The AWS IoT Jobs service publishes MQTT messages to reserved topics when
jobs are pending or when the first job execution in the list changes. Devices can
track pending jobs by subscribing to these topics.
Job notification types
Job notifications are published to MQTT topics as JSON payloads. There are two kinds of
notifications. To learn more, expand each of the following two categories.
◦ ListNotification
◦ NextNotification
ListNotification
A ListNotification contains a list of no more than 10 pending job executions. The job executions
in this list have status values of either IN_PROGRESS or QUEUED. They are sorted by status
(IN_PROGRESS job executions before QUEUED job executions) and then by the times when they
were queued.
A ListNotification is published whenever one of the following criteria is met:
A new job execution is queued or changes to a non-terminal status (IN_PROGRESS or QUEUED).
An old job execution changes to a terminal status (FAILED, SUCCEEDED, CANCELED, TIMED_OUT,
REJECTED, or REMOVED).
NextNotification
A NextNotification contains summary information about the one job execution that is next in the
queue.
A NextNotification is published whenever the first job execution in the list changes:
• A new job execution is added to the list as QUEUED, and it is the first one in the list.
• The status of an existing job execution that was not the first one in the list changes
from QUEUED to IN_PROGRESS and becomes the first one in the list. (This happens when there are no
other IN_PROGRESS job executions in the list or when the job execution whose status changed
from QUEUED to IN_PROGRESS was queued earlier than any other IN_PROGRESS job execution in the
list.)
• The status of the job execution that is first in the list changes to a terminal status and is removed from
the list.
The End