0% found this document useful (0 votes)
26 views39 pages

MDE - User Guide

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views39 pages

MDE - User Guide

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

RELEASE 1.2.

Manufacturing Data Engine


User Guide

Shared under NDA

Table of Contents Applicable versions


Goals of this document MDE 1.2.0
MC 2.6
Context
Document information
Introduction to the solution
Status: Final
Use cases Last updated: Nov 15, 2022
Capabilities
Components Intended audience
Overall architecture Googlers, Partners, Customers
Information flow
Contact for questions
Information architecture [email protected]
Archetypes
Data Types
Other documents
Default Data Types and Archetypes documentation/pdf/Deployment Guide.pdf
Transformations documentation/pdf/MDE - Configuration Guide.pdf
documentation/pdf/Release Notes.pdf
Ingesting data into MDE documentation/pdf/Troubleshooting Guide.pdf
How to ingest data in the solution documentation/pdf/User Guide.pdf
Streaming ingestion documentation/pdf/Default Kit - Postman Getting Started.pd
documentation/pdf/Whistle V2 Documentation.pdf
Payload Qualifiers
Batch Loading Manufacturing Connect - Supported Protocols
API details LEM-UserGuide-v2.3.2-PDF.pdf
Common fields LE-UserGuide-v3.2.5-PDF.pdf
CSV fields
Format details
JSON
Glossary of terms
documentation/pdf/MDE - Glossary of Terms.pdf
AVRO
CSV
AVRO_RAW_WRITER
OPERATIONS_REPROCESSED_AVRO (only
available through the API)
Data Destinations
BigQuery pipeline
Big Table pipeline
Cloud Storage Pipeline
Raw message storage
Ingestion errors
Setting up a new Tag

Copyright 2022 Google LLC Page 1 of 39


Changing Tag storage configuration
Adding a transformation to a Tag
Managing Metadata
Creating a Metadata schema
Create a Metadata bucket.
Creating Metadata for a Tag
Receiving Metadata from the edge

Moving Configurations Between Environments


Configuration Bucket
Import/Export Jobs API

Integrating with Other Subsystems


Enabling Anomaly Detection

System Configurations

Monitoring and Troubleshooting

Goals of this document


This document provides the guidelines to use the Intelligent Manufacturing Data Engine solution [MDE].

Context
Manufacturing Data Engine is a packaged solution that works seamlessly with Manufacturing Connect, a marketplace product
provided by Litmus Automation. There are multiple possible combinations:

1. Manufacturing Data Engine + Manufacturing Connect + Manufacturing Connect edge.


This is the “standard”, fully integrated deployment that covers edge connectivity, and includes a user interface for
Manufacturing Data Engine, built into Manufacturing Connect.
2. Manufacturing Data Engine + Manufacturing Connect, with a 3rd party / custom edge stack.
This is an option for customers that already have an edge stack deployed (e.g., OSIsoft, or even custom
developments). Customers can deploy (for free) Manufacturing Connect, and use the inbuilt UI to configure and
manage the Manufacturing Data Engine.
3. Alternatively, Manufacturing Data Engine can be deployed standalone (without Manufacturing Connect).
Interaction (configuration, monitoring etc.) with the Data Engine will be via its API.

Copyright 2022 Google LLC Page 2 of 39


Copyright 2022 Google LLC Page 3 of 39
Introduction to the solution
MDE is an end to end solution providing scalable Factory To Cloud connectivity out of the box (in combination with
MC/Litmus). The main goal of MDE is to provide a zero code pre-configured set of GCP infrastructure that is able to
ingest, process and store data from Industrial devices in the Cloud based on the user’s configuration.

Once the machine and process data is available in the Cloud, it will also be easier to leverage Cloud tools and
technologies to extract value from that data. Acquiring industrial data is traditionally a high complexity and high risk
process that adds unnecessary time and cost to any Cloud Industrial Information Management use case. MDE has been
designed to provide a generic, easy to use, easy to deploy solution that can make that process shorter, more efficient and
more predictable.

Google Vertical Solutions team has developed the MDE Solution for Manufacturing with the Collaboration of Litmus
Automation. The end to end solution is composed of components built by Google and components built by Litmus
Automation for Google based on their current product line.

MDE plays a role in a suite of solutions for Intelligent Manufacturing:

MDE is the acquisition, transformation and storage layer of the infrastructure. Acts as a data hub for all use cases to
connect to access manufacturing information. Provide a safe, efficient and available data lake containing all
manufacturing information.

Other solutions that comprise the Intelligent Manufacturing suite are:


- Manufacturing Connect: is the Cloud Component to remotely manage all Manufacturing Connect edge instances.
It also acts as a UI for the configuration of the MDE Cloud solution.
- Manufacturing Connect edge: which is an edge to cloud gateway capable of translating 200+ industrial
communication protocols into one of the 2 supported streaming ingestion message endpoints in GCP, PubSub or
Cloud IoT Core.
- Looker: GCP's BI tool to explore and analyze the data in any of the GCP databases such as BQ or BT.
- Specific Manufacturing ML models: deployed in the Vertex infrastructure. Designed and built to provide insights
on streaming manufacturing data.

All components of the Intelligent Manufacturing Suite have been designed to work seamlessly with each other. They all
share the same configuration and they are semantically interoperable.

MDE and the rest of the components are configurable. Users can define their specific data requirements and the system
will adjust to those specifications without having to modify the code underlying the solution. MDE will adapt and adjust
based on the data specification. Configuration can be updated using the MC user interface or the provided MDE
configuration API.

Copyright 2022 Google LLC Page 4 of 39


Use cases
The use cases that MDE is enabling fall mainly into 3 categories:
- Analytical use cases: leverage BI GCP tools to produce reports, calculate KPIs and real time dashboards using the
real time information collected from the manufacturing floor.
- Machine learning use cases: leverage GCP ML tools and platforms to create, train and execute ML models that
are relevant to optimize any aspect of the manufacturing operation.
- Integration use cases: connect manufacturing data with Digital Twin solutions or other corporate systems to
provide an integrated view of the manufacturing data with other perspectives available in the company.

Capabilities
The capabilities that MC/MCe and MDE fulfills are:
● Ability to acquire data streams at the edge from multiple industrial controllers and machines translating many of
the available protocols and dialects available in the market.
● Ability to store and process locally those data streams
● Ability to transform those data streams into MQTT and PubSub messages that can be sent to GCP from the edge
locations using a conventional Internet connection.
● Ability to Map and Transform any MQTT or PubSub Payload structure into a predefined data schema based on the
user’s configuration.
● Ability to calculate streaming analytics and transformations based on the user’s configuration.
● Ability to store in any of the mainstream Databases and Storage solutions available in GCP (BQ, BT and GCS)
based on the user’s configuration
● Ability to monitor and supervise the state of the end to end solution using a simple interface
● Ability to set up a user's configuration using a simple, easy to use interface.

Components
The components of MDE are:

● Configuration Manager: main Cloud Component of the MDE. It deploys as a container on GKE and is the element
managing the different message routing pipelines, message transformations and message storage. Contains the
user configuration.
● Dataflow components: different Dataflow jobs that are deployed with the solution, that route and transform the
messages from the edge and write the processed messages into the different databases based on the user
configuration. The Dataflow components communicate with the Configuration Manager to receive the specific
configuration for each message.
● PubSub Topics: PubSub is the main messaging backend that MDE uses to route messages between the different
components of the solution. Several topics and subscriptions are created to ensure the routing of the incoming
messages is done according to the user configuration.
● Databases: MDE creates a number of schemas in BigQuery, BigTable and CloudStorage where the data will be
stored. Those schemas are generic. MDE will route messages to the right tables and files based on the user
configuration.
● Federation API: MDE provides an API to access all data repositories using a common interface. This allows users
to query their data independently of where it is stored and enables them using the same configuration language to
create specific queries to the manufacturing information.
● Integration layer
○ Looker Integration: MDE LookerML component that allows looker to explore natively all data contained in
the MDE solution for those users using Looker.
○ AutoML integration: MDE component to integrate MDE data with the different ML tools available in GCP
to support the creation and training of Industrial ML models using MDE collected data.
○ Grafana and Data Studio integrations: provide native access to MDE data from Grafana and Data Studio.

All components are independent of each other and are able to operate as a unit or as independent units. The solution can
ingest and manage data streams coming from any other edge solution. The edge solution can also be integrated with
other types of cloud architecture. However, we believe that the value of an integrated solution is larger than the sum of the

Copyright 2022 Google LLC Page 5 of 39


elements comprising it.

Overall architecture
The overall architecture of the solution (MDE and MC working together) is as follows:

1. MC edge and MC (cloud companion) handle the data acquisition at and from the EDGE
2. PubSub is the ingestion point for raw messages generated by MC Connect or any other MQTT broker or service
that needs to be integrated in the solution
3. All message routing and transformation is handled by DataFlow. Cloud DataFlow subscribes to the incoming
PubSub Topics and routes messages to the different storage solutions.
4. Config Manager running in GKE as a Kubernetes application is able to store the system’s configuration including
user preferences and metadata and system defaults and orchestrate the different Dataflow jobs that route
messages to its rightful destination and produce the on-the-fly transformations and validations.
5. Integration is provided to main GCP BI and ML tools, including Looker, Data Studio, and Vertex AI Engine. The
engine powering that interaction is the MDE API which delivers syntactically consistent data access which is
abstracted from the data model and technical architecture.

Information flow
The information flow through the MDE Solution following the next steps:
1. Data points are generated at sensor level and digitized by the PLC
2. Manufacturing Connect creates a device connection to the PLC and a Tag connection reading the data value that
needs to be collected. Tags at Manufacturing Connect level are polled tags and are refreshed based on a given
refresh rate.
3. Data from the PLC is packaged in a standard Manufacturing Connect payload structure that includes:
a. deviceName: name of the device driver that originated the message
b. tagName: name assigned to the tag being acquired
c. deviceID: automatic ID being assigned to the Device once it is created
d. Success: if the read from the PLC was successful
e. dataType: type of data contained in the value field
f. timeStamp: timestamp of the reading in msec
g. Value: the actual value field of the event
h. Metadata: any user defined metadata (key value pair) associated at the moment the tag was created
i. registerID: name of the register the value is linked with
j. Description: a description of the value

Copyright 2022 Google LLC Page 6 of 39


4. Manufacturing Connect packs the payload into a JSON message and routes the information to PubSub based on
the edge configuration selected.
5. PubSub ingestion topic is the recipient of edge messages and is the ingestion topic for MDE. From there the
messages are picked by MDE's Config Manager and routed through Dataflow until they are inserted into the
different available databases.
6. During the ingestion flow, messages are transformed and parsed into a common data model. Those
transformations are managed by a configuration language that can be modified through the Manufacturing
Connect UI or the Config Manager API. The different stages that compose that transformation are:

Copyright 2022 Google LLC Page 7 of 39


Information architecture
MDE information architecture is designed to support 2 main sources of related manufacturing data elements:
● Manufacturing sensor data, acquired from the factory floor or derived from factory floor data, structured as
individual variables (called Tags in the MDE configuration model) that are specific and distinct from each other
and that contain an indefinite number of records, one for each measurement produced.
● Metadata about manufacturing data, describing several contextual aspects of the data streams. For instance:
● Its origin and the way it was captured
● Its type and nature (and the measurements they represent)
● The assets that refer to and provide context about them

MDE is designed to be used as a data repository of Manufacturing information, including sensor data and metadata, and
the necessary relationships between them. The design criteria of MDE is to optimize ingestion and query performance of
manufacturing sensor data and metadata and provide an effective support to provide easy access to that information and
to create complex calculations with those data repositories.

Manufacturing data and metadata is stored in an standardized set of data tables and schemas across the different
databases supported by the system. All MDE implementations share the same table names and schemas. Those data
structures have been created to optimize the stability and performance of the ingestion and data access processes. They
have been designed to get the most out of the platforms they are built in. They are also efficient and cost effective. They
integrate Google’s best practices for those products.

To provide the necessary flexibility on how to organize and store tags and metadata, MDE provides a specific data model
that utilizes the standardized data schema but can present the information differently based on the user specific
configuration. The data model is based on the following concepts:

- Archetypes: define the overall characteristics of the data series to be stored. Tags sharing the archetype also
share similar characteristics: have the same type of index or timestamp structure and a common payload type or
nature. Each archetype has a specific data table and schema where the information is stored. All tag information
of the same archetype is stored in the same table. The core characteristics of the database are configured based
on the nature of the archetype. Archetypes are immutable for a given version of MDE: they can be extended by
generating a new version of the overall solution.

- Data types: define a sub-set of tags, of the same archetype, that share a common payload structure and common
payload qualifiers. They do not only share the same payload characteristics but they have the same payload
components and structure. The user's configuration defines data types which means that they can be different
from user to user for a given version of MDE.

- Tags: tags are the individual streams of data ingested in the system. Tags represent a consistent set of
measurements that reflect the same measurement over time. Tags belong to a given archetype and data type.
Each value received for a given tag is stored as an individual record in any of the supported databases. Tags can
also be associated with metadata to describe the nature of the collection of values. Individual tag records can be
also qualified by metadata using payload qualifiers.

- Metadata instances: each tag can be associated with a metadata instance. A metadata instance expresses a set
of consistent context characteristics that can be grouped together logically. For instance, a metadata instance
can be used to reflect the asset associated with a given tag and may contain: plant name, line number, machine
name and sensor ID for instance.

- Metadata schemas: define a set of fields that can be materialized in a given metadata instance. For example, two
different metadata instances qualifying 2 different tags can be related and compared by associating them to the
same metadata schema.

- Metadata buckets: defines a combination of one or more metadata schemas that reflect the required metadata
specifications to qualify a certain aspect of the context of a given tag. Metadata buckets can be associated with
tags or types. When associated with a tag, they create a certain ‘contract’ or ‘promise’ by which the tag is
expected to be qualified with: for example, we may want all sensor readings coming from physical sensors to be
described with the type of measurement they are providing (physical property, units, resolution, etc). To do so, we
could define a certain metadata structure under a metadata schema and associate it with a metadata bucket
Copyright 2022 Google LLC Page 8 of 39
containing all those information elements. That metadata specification can then be assigned to all tags sharing
the same data type used to collect all sensor values from the edge. Metadata from the same bucket is
comparable and can be aggregated across multiple tags. Enables an easy and quick way to create a semantic
structure describing tags that share a common context (ie all tags belonging to a certain machine, all tags
measuring the same physical property, etc).

- Payload qualifier: is a metadata instance associated with a specific tag record instead of the whole stream. It can
share the same metadata schema with the other records of the tag but it is stored together with the values of the
tag as a ‘qualifier’ or context description of the time when the value was generated such as the shift, the operator
or the production order active at that time.

- Metadata providers: MDE provides an internal metadata repository where users can create and store Metadata
schemas and Metadata buckets that the different Tags and Types can use. The default MDE metadata provider,
managed by the Config Manager, is referred to as the “local provider”. However, MDE also supports remote
metadata providers (such as MDM or ERP systems) that can publish a similar Metadata model using a REST API.
Those API endpoints can be registered in MDE as “remote metadata providers” and used in the configuration of
Metadata entities that are provided by those third party systems.

- Transformations: predefined functions, bound to certain input and output archetypes, that can provide a real-time
generic transformation to any tag that matches the input archetype into a new tag of the output archetype. The
transformation is applied to every value of the selected tag.

In the following sections we describe those entities in more detail:

Archetypes
To ensure that information representing different physical variables (which can be infinitely diverse) can be stored using a
generic, finite and optimized schema, MDE matches the different measurements types captured (represented by a
message type and a payload), in 4 specific data types called archetypes:

● Numeric Data Series (NDS): a time stamped series of numerical values, ie, a temperature sensor sending data
every second to the Cloud.
● Discrete Events Series (DES): a specific information associated with a single time stamp, ie, an operator driven
parameter change in a specific machine of the process that needs to be recorded.
● Continuous Events Series (CES): a series of consecutives states defined by a specific information, a start time
and an end time, ie, the operating state of a given machine or the recipe of a production line.

MDE will match any incoming messages to one of these archetypes. All values ingested will be stored as an archetype. All
values stored of a given archetype share a common database schema and support and specific metadata requirements.

Data Types
Each archetype can be further classified based on Types that specify the archetype in further detail. For instance, a single
archetype such as continuous events could be subdivided into: machine operation state and production program state.
The first subtype will be associated to a payload containing one String value among the list “Running”, “Idle”, “Scheduled
Maintenance” and “Unscheduled Maintenance” while the second one could be associated to a complex schema
containing “Brand”, “Size”, “RecipeID”, “Recipe Description”. Some types will be available by default in the MDE. The user
can create new types. All types can be associated to a schema defining the payload structure and time stamp structure.

Types can be user defined. The Configuration Manager UI can be used to define new types. New types are defined by the
following characteristics:
1. Archetype they belong to that determine the data schema required to store the solution and the payload
complexity
2. Payload structure defining the different elements contained in the payload message for each instance of the
event belonging to this type
3. Event identification structure (typically timeStamp related) defining the structure of the element identifying a
record from the others.
4. Metadata configuration defining the requirements for metadata associated with the type, typically which
metadata elements and schemas need to be completed to define a Tag as part of this type.

Copyright 2022 Google LLC Page 9 of 39


Default Data Types and Archetypes
A default configuration is created together with the deployment of all MDE components. The default configuration is
sufficient to ensure a successful operation of the solution. The default configuration has been created to ensure a
seamless integration between MDE and Manufacturing Connect edge components.

The default types available in MDE are:

Archetype Type Payload TimeStamp

Numerical Data Series NDS Numeric Value as Number Timestamp in msec


(NDS)
- 1 Timestamp
- 1 Single variable
payload

Discrete Event Series (DES) DES Binary EventState as binary value Timestamp in msec
- 1 Timestamp EventStateLabel as String
- 1 Complex payload
structure (JSON) DES Default Any JSON Timestamp in msec

Continuous Event Series CES Default Any JSON Timestamp Start in msec
(CES) TImestamp End in msec
- 2 Timestamp (start
and end)
- 1 Complex payload
structure (JSON)

Transformations
MDE defines standard transformations between archetypes that work for each archetype implementation. MDE defines
specific transformations for given archetype sub-types. Some of these transformations are generic and can be applied to
similar types using parameters.

The set of default transformations available in the system are:

Transformation Description From Type / Archetype To Type Archetype

Value change Creates a continuous event based Discrete Event Series Continuous Event Series
monitor on any element of the payload of a Numeric Event Series
discrete event series when the
value switches to a new one.

All message types or tags will be mapped to a specific archetype or archetype sub-type. When a new message for a given
type or a given tag is received by MDE it will be converted to an archetype specific message structure and payload.

Copyright 2022 Google LLC Page 10 of 39


Ingesting data into MDE

How to ingest data in the solution


MDE supports ingesting data using streaming via PubSub or using batch ingestion leveraging data files. Both methods are
described in this section. Independently of the origin of the data it will be processed identically.

Streaming ingestion
The solution will ingest any message that is received at the PubSub landing topic:

“input-messages”

Any message received in the topic will be parsed with the existing pipelines available in the default configuration. The
default configuration has been implemented to understand and be able to decode message payloads generated by
Manufacturing Connect devices. These messages have a structure similar to:

Depending on the data type of the value field in the payload the message will be parsed to a numerical type or an event
type. The transformations that have been implemented in the default configuration are as follows:

Payload value type Message parsed to Type: Parsing:

Number Numerical Data Series Default TagName > MDE TagName


Value > Tag payload
Non Numeric, non Discrete Data Series Default Timestamp > Tag value timeStamp
binary Metadata > Auto-created Metadata bucket
DeviceName, DeviceID, RegisterID, Description.
Success and Data type > Edge device metadata
bucket

Users are allowed to change the parser configuration to update those associations and to change the parsing process.
Please refer to the configuration user guide to find out how.

Based on the type that is mapped to the incoming message a default storage configuration will be assigned to the
message. The default configuration of the system defines the following storage specifications for the default data types:

Types Description Default storage profile

Default numerical data series Data type for payloads consisting of a BT, BQ and CS
numerical value, a time stamp, any
payload qualifiers and any metadata
buckets

Default discrete event series Data type for complex non-numerical BQ, CS
payloads and a timestamp, any payload
qualifiers and any metadata buckets

Default continuous data series Data type for complex non-numerical BQ, CS
payloads and a StartTime timestamp and
an EndTime timestamp.

Copyright 2022 Google LLC Page 11 of 39


The system defines a number of predefined tables in BQ and BT and a specific data format for CS where to store the
incoming information.

Payload Qualifiers
Additionally to the payload value, MDE supports adding a dynamic metadata JSON schema that can be stored beside
each value. The idea is to collect additional information qualifying a specific value, such as sensor reading, with relevant
context information, such as the shift or the machine cycle that was active when the reading was taken.

Those payload qualifiers can be added to the incoming payload message and parsed in the pipeline to the payload
qualifier section of the Type. The content of the payload qualifiers is inserted into the different storage solutions and
revealed as metadata.

Copyright 2022 Google LLC Page 12 of 39


Batch Loading
The main way to ingest information into the platform would be via streaming either by sending messages to PubSub,
however it's also possible to ingest data in batch. This is also useful for reprocessing data or importing from external
systems.

Batch loading by uploading data to a GCS bucket setup by MDE which is named <project-id>-batch-ingestion in
one of the supported formats (JSON, AVRO or CSV). Whenever a new file is uploaded to this bucket, it will be detected by
the batch ingestion Dataflow job and will send each message individually into the input-messages PubSub topic. The
way messages are read vary from format to format.

In order for MDE to be able to recognize which filetype is being ingested and to provide configuration options, each filetype
needs to be registered in the Configuration Manager before they can be successfully ingested. The below section of the UI
(FILE INGESTION) provides access to the configuration of the batch loading feature. When selecting the File Ingestion tab
the following information appears:

This is the list of available registered Ingestion Specifications in the current system. From this UI the users can edit any of
the current specifications or create a new one. All specifications can be enabled or disabled, which means that a certain
specification can be created but disabled immediately after the files have been uploaded, remaining in the system
configuration for a future use.

To select a specific specification, simply click on the Actions column and select ‘View / Edit’:

Copyright 2022 Google LLC Page 13 of 39


A detail screen is shown for that specific ingestion specification. Each file type requires a different set of parameters.
Depending which Source Type is selected, a different set of parameters is displayed.

The common parameters to all types are:


- Name: name of the specification
- Active: true / false. Only active specifications are used to import files.
- Pattern: the expression to identify the files to be processed following this specification in the bucket.
- Include file attributes: true / false. Adds extra information from the file name as metadata in the JSON being
generated from the different files (such as the file name, etc).

The most complex type is CSV. It requires a number of extra configuration parameters such as the character used as
separator and the schema of the file. Specifically the additional parameters required are:
- Separator: the character used to delimit the columns in the file
- Skip Rows: the number of rows to skip from the file (in case there is a header)
- Use header row as field names: the system will use a certain row in the file as the name for the columns /

Copyright 2022 Google LLC Page 14 of 39


attributes being generated.
- Header row number: defines the line number of the header to be used for the previous parameter.
- Column name mapping: appears in case “Use header row” is set to false and allows the user to input a json file
with the mapping of the different columns and attribute names.

All this configuration can also be implemented using the configuration API of the ConfigManager, specifically using the
below REST API call, that shows a snippet of an Ingestion Configuration example:

POST http://{{hostname}}:{{port}}/api/v1/ingestions
{
"source": "CSV",
"filePattern": "(.*)-input-1.csv",
"separator": ",",
"skipRows": 1,
"dateTimeColumns": [
{
"index": 7,
"format": "dd-M-yyyy hh:mm:ss"
}
],
"schema": {
"registerId": 0,
"success": 1,
"description": 2,
"tagName": 3,
"value": 4,
"deviceName": 5,
"deviceID": 6,
"timestamp": 7,
"event": [
{
"label": 9,
"description": 10
}
]
}
}

Copyright 2022 Google LLC Page 15 of 39


API details

Common fields
● source (required): (CSV, JSON, AVRO, AVRO_RAW_WRITER, OPERATIONS_REPROCESSED_AVRO)
● filePattern (required): This should be a regular expression of the location/name of the file without the bucket name(ie
for files <bucket>/loadData/batch1/batchmessages1.csv you could use
"loadData/batch1/batchmessages\.*.csv")
● name (required):Name to identify this file ingestion configuration.
● enabled (required): Boolean to enable/disable this configuration.

CSV fields
● separator (optional comma as default): A character that defines the separator for the records.
● skiprows (optional): If you want to skip any rows before starting reading the CSV file.
● dateTimeColumns (optional): If you want to use any of the columns as datetime and want to parse them you can use
this. It expects an array where each element consists of an index and a format. The index is a 0 based number for the
column number and the format which consists of a date format as defined by the Java DateTimeFormatter.
● inferredSchemaHeaderRow (optional): You need to set either this field or the schema field, both can't be empty. This
field sets the row number to use as a header row for parsing the CSV, if you want to use the first row just set it to 0.
● schema (optional). You need to specify either this field or inferredSchemaHeaderRow, both can't be empty. This field
lets you define a schema to parse your CSV, it accepts a JSON where each key defines the schema and the value is the
column to use as value for that field as shown in the example above.

Format details

JSON
MDE supports JSON in the same way as BigQuery does, this means JSON data must be newline delimited. Each JSON
object must be on a separate line in the file. Each line will be sent as a separate message to PubSub. Ej:

{
"name": "numeric-input-json",
"enabled": true,
"source": "JSON",
"filePattern": "numeric-input.*.json"
}

AVRO
The AVRO schema used should be convertible to JSON since the message will be converted before being sent to PubSub.
Each record will be sent as an individual message. Ej:

{
"name": "avro-1",
"enabled": true,
"source": "AVRO",
"filePattern": "/avo/testMessage(.*)"
}

CSV
CSV files will be transformed into JSON either by the provided schema or by inferring a schema from a header. If you use a
header the JSON will be flat as we don't support nested schema inference. Each line of the CSV will be sent as a separate
message to PubSub. Ej:

Copyright 2022 Google LLC Page 16 of 39


{
"name": "CSV-skipRow",
"enabled": true,
"source": "CSV",
"filePattern": "header-skip(.*).csv",
"inferredSchemaHeaderRow": 1,
"separator": ",",
"skipRows": 1
}

AVRO_RAW_WRITER
This configuration is used to re-ingest messages that have been produced by the gcs-writer Dataflow pipeline. It expects
only files produced by that pipeline with the schema, it will resend the message to the pipeline as if it was sent for the first
time, if your messages where processed correctly the first time this will cause data duplication.:
● String messageId
● Long timestamp
● Map<String, String> attributes (PubSub headers)
● Byte[] message

{
"name": "reprocess-raw-avro-1",
"enabled": true,
"source": "AVRO_RAW_WRITER",
"filePattern": "testMessageReprocessing1.avro"
}

OPERATIONS_REPROCESSED_AVRO (only available through the API)


This configuration is different from the other configurations since it's used to reprocess messages that failed at any point
of the ingestion process. To do so, it uses the OperationsDashboard BigQuery table and using the messageId tries to look
for the original message in the raw ingested messages and send it back to the beginning of the queue.

The process to use this configuration is as follows:

1. Go to BigQuery and identify the failed messages in the Operations Dashboard Table.
2. Once you have a query that identifies them, go to Query Settings and save the results of the query as a new table.
3. Once the new table is created, select it and click on EXPORT -> Export to GCS, then select a gcs bucket, select
AVRO as format and select Compression (whichever you prefer).
4. Once it has been exported, create an ingestion specification with the OPERATIONS_REPROCESSED_AVRO as
source and the filePattern that matches the filename and where you will copy it.
5. Finally, copy the file to the path above.

This process will take longer than the other batch ingestions since it will first read the records in this new file and try to
find each original message in the gcs raw bucket and join both data streams. Ej:

{
"name": "reprocess-messages1",
"enabled": true,
"source": "OPERATIONS_REPROCESSED_AVRO",
"filePattern": "payloadresolutionfailedmessages.avro"
}

Copyright 2022 Google LLC Page 17 of 39


Data Destinations
MDE supports different databases for records ingested by the system. Data destinations are independent and can be
selected per tag as part of the configuration of the solution. A default configuration optimized for the default data types is
available after the solution is deployed. Each tag can be configured to use one or more of the available destinations.

Each data destination can be helpful for a given use case. Understanding which storage solution is optimal for each data
stream needs to be determined by the user. However, as an overall guidance, here are some tips on using the right storage
solution for a given data stream:

● Cloud Storage is the lowest cost solution. Accessing data in Cloud Storage has a higher latency than any other
destination. Cloud storage can be useful when users need to store large amounts of data for a longer time but
they don’t have an immediate need to explore or navigate the data right away. Data in cloud storage can be
exported and inserted in any DW tools or solutions in the future when that need arises.

● BigQuery is our main analytical DW and BI product. It can store large amounts of data (Pb scale DB) and provide
access via SQL with second-level response times. Inserting data in BQ is ideal to support analytical workloads
such as reporting via Looker. BQ cost (insert, store and access) is significantly higher than Cloud Storage.

● BigTable is our non-relational database and is ideal for sub-second latency streams, where high performing
inserts and queries are required. BigTable is ideal in near-real-time use cases, where sub-second access to the
edge values is required. BT cost is significantly higher than BQ and Cloud Storage cost.

BigQuery pipeline
When BigQuery storage is active, MDE will insert a new row each time a new value is received for a given Tag. The
solution provides a default schema in BQ where it stores all incoming values. The default schema consists of one table
for each data archetype. The data is stored in BQ under the “sfp_data” folder in the project's BQ explorer.

The ingestion to BigQuery is done via BigQuery Streaming mode to ensure low latency ingestion.

The different tables available match the default data archetypes provided with the default version of the solution. All
Types from a given archetype are stored in the same tables.

Archetype Table Name Table clustered by Table partitioned Table schema


by:

Numerical data NumericDataSeries tagName Day,


series eventTimestamp

Copyright 2022 Google LLC Page 18 of 39


Discrete Events DiscreteDataSeries tagName Day,
data series eventTimestamp

Continues Events ContinuousDataSeries tagName Day,


data series eventTimestamp
Start

Part data series ComponentDataSeries tagName Day,


eventTimestamp

Copyright 2022 Google LLC Page 19 of 39


Operations OperationsDashboard None Day,
dashboard eventTimestamp

All BQ tables are standard and can be queried and explored using the BQ tool.

For those fields that are now stored as JSON format we rely on the BQ JSON extension. This new extension allows users
to natively query fields in a JSON column almost as native table columns, reducing query complexity and improving query
efficiency for those fields. For those projects where that extension is not available, MDE is generating a KV pair equivalent
column that can be queried without the need for that extension. MDE detects if the extension is active and leverages the
JSON fields if that is the case. If MDE detects that the extension is unavailable, it will automatically rely on the KV pair
fields.

When ingestion fails and data can't be inserted in the expected canonical table, MDE will insert those messages in a dead
letter table in BQ called InsertErrors. This table holds the message and reports the error generated when trying to ingest
that message:

Big Table pipeline


The solution supports milliseconds resolution time-series that’s stored in Bigtable. The solution wraps the details of the
Bigtable schema and provides an API that enables the storage and the retrieval of time-series events. The current API only
supports the storage of the numeric values without any additional metadata. When the time-series storage is active for
numeric archetype tags, the ingestion pipeline stores the numeric of the tags into the time-series database. You can then
retrieve the time-series by the tag name and a specified duration of time (start and end times).

The time-series API also supports multiple resolution and aggregation when you retrieve the data. As the ingested data
are usually in the milliseconds of resolution, retrieving the data might produce many records which might not be as useful
for a dashboard like Grafana. In that case, you can specify an aggregation and a different resolution during the retrieval of
the tag values.

After a successful deployment, the time-series will have an internal endpoint exposed using an internal load-balancer. You
can retrieve the IP from the Cloud console network section. Alternatively, if you’re using the terraform deployment for the

Copyright 2022 Google LLC Page 20 of 39


network setting, then you’ll have a DNS entry for the time-series API named timeseries.sfp.cloud.google.com. To
insert a new time-series record in the database, the following endpoint is used

POST ​http://<service-ip>/api/v1/records
{
"tagName": "TEST-Machine",
"timestamp": 1642506148000,
"value": 20
}

In the above example, the time-series will insert a new value 20 for a tag named TEST-Machine at a timestamp specified
by the unix epoch. You don’t normally use that API as it’s already automatically invoked by the solution data-lake
workflows.

To retrieve time-series data for a particular tag, you can use the following query API which will retrieve the records of the
specified tag name for the specified duration

POST http://<serivce-ip>/api/v1/records/query
{
"startTimestamp": 1642506147000,
"endTimestamp": 1642506149000,
"tagName": "TEST-Mahcine"
}

Alternatively, you can specify downsampling logic that aggregates the milliseconds records and provides a more
coarse-grained value. In that case, you’ll need to specify the downsample section of the query.

POST http://<serivce-ip>/api/v1/records/query
{
"startTimestamp": 1642506147000,
"endTimestamp": 1642506149000,
"tagName": "TEST-Mahcine",
"downsample": {
"duration": 5,
"unit": "SECONDS",
"aggregation": "MEAN"
}
}

The above request will aggregate the records over a 5 seconds interval and perform an average calculation on the values,
then returns the aggregated results. Typically, you’ll use the Federation API to query the time-series data through a
visualization tool like Grafana.

Copyright 2022 Google LLC Page 21 of 39


Cloud Storage Pipeline
When the Cloud Storage setting is active, MDE will save the corresponding tags in the GCS bucket named
<project-id>-gcs-ingestion. A new file will be created for every 5 minute window and will have the format
<windowinfo>-<SS>-of<NN>, the rows in each file will be determined by the pipeline throughput. The files are stored
using AVRO file format with snappy compression, but given that the tags can be heterogeneous, it stores each row as a
STRING, matching the TagEvent representation.

Raw message storage


Independently from the final storage destination once the messages have been processed, MDE provides a raw message
storage that collects all raw messages ingested in the system before being processed. The messages are stored in a GCS
bucket called <project-id>-raw, they are stored in AVRO format with the following SCHEMA:

● messageId: Message Id assigned by PubSub in the input-messages topic. It's used to be able to reprocess failed
messages since it's propagated across all the processing steps.
● timestamp: Ingestion timestamp
● attributes: PubSub attributes.
● message: PubSub message data.

The messages are emitted every 10 minutes with a name containing the datetime like
raw-messages2022-02-08T18:00:00.000Z-2022-02-08T18:10:00.000Z-pane-0-last-00-of-10.

These message store allows:

● Uniquely identify each incoming message to manage lineage across the system
● Provide a safe storage for all edge message where they can be stored raw for a long period of time cost effectively
● Offer the possibility to re-run through the ingestion process and configuration older messages in case that
configuration has changed and it is necessary to re-process the data differently

Ingestion errors
The ingestion process can fail at different stages if the configuration is not set correctly for the payloads ingested in the
landing topic. For example, an incoming message could contain an unrecognized payload structure in the MDE
configuration. In that case, those messages can’t be parsed to the destination data model and are dropped redirected to
the ops-writer pipeline to be written into BigQuery in the OperationsDashboard table to make troubleshooting and
reprocessing easier.

Copyright 2022 Google LLC Page 22 of 39


Setting up a new Tag
Tags are created dynamically once a new message is detected with a new TagName value. Based on the content of the
incoming message payload and particularly, the data type of the ‘value’ field of the payload, a Type is selected for the Tag.
This association can be modified by extending the system configuration. The behavior of MDE’s default configuration is as
follows:

Payload value type Message parsed to Type: Parsing:

Number Numerical Data Series Default TagName > MDE TagName


Value > Tag payload
Binary Binary Discrete Data Series Timestamp > Tag value timeStamp
Metadata > Auto-created Metadata bucket
Non Numeric, non Discrete Data Series Default DeviceName, DeviceID, RegisterID, Description.
binary Success and Data type > Edge device metadata
bucket

The configuration interface allows to pre-create a given configuration for a specific Tag. If the Tag configuration has been
already created when the first value is received, the stored configuration will be used instead of creating a default
configuration. This allows users to create specific configurations for metadata, storage profiles or transformations for
specific tags in the system. It is important to remark that once a Tag has been assigned to a given type it can’t be
assigned to a different one. Types are associated with a Tag configuration; this link can only be changed by deleting the
Tag configuration and creating a new one from scratch.

To create Tag configurations or to edit existing tag configuration please refer to the MDE configuration guide.

Changing Tag storage configuration


Users can modify the destination of the Tag data in the UI. By selecting the “Cloud Management” section in the main
menu and choosing the “Cloud Tags” tab, users can display the list of registered tags in that MDE implementation. The list
of tags corresponds to any tag that has ever been ingested and is now part of the Configuration.

The list of Tags supports several navigation options such as search and filtering to make finding the right Tag in the
overall configuration easier.

Copyright 2022 Google LLC Page 23 of 39


Once we have identified the target Tag that we are interested in, the only remaining task to access its configuration is
clicking on its ID or opening the Edit menu under the Actions column of the main display.

Copyright 2022 Google LLC Page 24 of 39


The Tag property menu will open. Inside that menu users can change the storage location for the values associated with
that particular Tag simply selecting which database should be used and which shouldn't. Clicking on Save will confirm the
changes and as from that moment Config Manager will only route the new incoming messages to the selected
destinations.

Adding a transformation to a Tag


Tags can be transformed on ingestion. A tag associated with a transformation will automatically generate a new derived
tag where the values resulting from the transformation will be stored. The name of the new tag is a combination of the
original tag name and the name of the transformation.

The transformed values are computed in real time every time a new value is received from the edge for the original tag.
However, considering some transformations might change the time base for a data series, it is possible that new values
are inserted in the data lake with a different frequency than the original tag that triggered its calculation.

There are a number of available transformations that are included in the default configuration package that is deployed
with the solution. These transformations are designed to work with specific Types only. Also, depending on the design of
the transformation, the incoming or outgoing Type or Archetype might be the same or different.

Managing Metadata
Any Tag can be associated with many different metadata instances. Each instance represents a snippet of information
that describes a certain area of the tag context. Metadata instances are JSON objects that follow a Metadata bucket
schema. Metadata Buckets are groups of Metadata Schemas that are semantically related (ie, describe a certain aspect
of the context). Buckets can be associated with tags and types so all tags created based on that Type will be
automatically associated with that metadata specification.

Copyright 2022 Google LLC Page 25 of 39


To manage Metadata in MDE we need to use the Manufacturing Connect UI. The Metadata Management UI is available
under the “Cloud Management” section and the “Metadata” tab.

To generate a given metadata snippet or instance for a tag we need to follow a number of steps:

Creating a Metadata schema


To define a certain area of the context of a tag we first need to define a schema characterizing the dimensions to provide
that contextualization. For instance, if we want to characterize the machine that the tag is associated with, we will create
a schema describing the fields that are relevant such as the machine name, machine type, year of construction, etc.

Schemas define the structure of the Metadata instance object that will characterize tags.. Schemas can also be
associated with a Domain file that defines the possible values of that object. Metadata schemas obey the JSON schema
specification.

A typical JSON schema defining a certain Metadata for a tag could be as follows:

{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "sampleContext1",
"title": "tag_sample_context",
"description": "Sample context schema for qualifying tags",
"type": "object",
"properties": {
"tag_device_name": {
"description": "Name of the device generating tag values",
"type": "string"
},
"tag_device_id": {
"description": "ID of the device generating tag values",
"type": "string"
},
"tag_measurement_type": {
"description": "Description of the measurement",
"type": "string"
}
},
"required": [ "tag_device_name" ]
}

Copyright 2022 Google LLC Page 26 of 39


Metadata schemas can be defined in the Metadata tab of the MC UI:

To define a new metadata schema the user simply needs to select the Add New Schema button in the UI

A contextual menu will open where all required Schema parameters (Provider, Name, Id, JSON schema and Description) can be
added and saved into the current configuration model:

Copyright 2022 Google LLC Page 27 of 39


The following fields needs to be completed:
● Provider: select from available providers (most cases this should be set to "local")
● Schema name: name for the new schema
● SchemaId: a single word to be used as the key for the new schema
● JSON Schema: needs to be compliant with the JSON specification.
● Description: description of the schema so other users can understand the purpose and utility.

Create a Metadata bucket.


Buckets are the unit of classification of the different metadata elements that we can associate with tags. Buckets contain
the schemas that define a certain structure defining the context for a given topic. For example, we can define an ‘asset
metadata’ bucket containing one or more schemas related to the asset identification of a given data stream. Metadata
buckets can be created in the Manufacturing Connect UI.

To create a new Metadata bucket users need to select the Buckets section of the metadata menu in MDE's MC UI:

Copyright 2022 Google LLC Page 28 of 39


The Add New Bucket menu allows users to add a new Bucket to the configuration model. Once clicked, a new contextual
menu appears where users can define the necessary attributes of the bucket itself:

Copyright 2022 Google LLC Page 29 of 39


The name of the bucket is a required field. The version is assigned to the bucket automatically. Bucket versions enable
users to create new metadata specifications for a given tag and still keep backwards compatibility with existing metadata
content available in the database.

The list of available Metadata Schemas appears under the Schema section of the Bucket. Once selected, Schemas get
added to the bucket. Once added, Schemas can be selected and the details of the particular Schema are displayed as part
of the Bucket information. Schemas are now copied to the Bucket, which means that if Schemas evolve the Schemas
available in the Bucket will not change.

The other 2 remaining options to configure for each Schema added to the Bucket are the Required and Dynamic flags:
- Required Schemas must be implemented for a message to be inserted in the databases.
- Remote Metadata providers provide Dynamic Schemas and will be queried from the API every time a new
message is received. Use this flag only if the bucket is associated with a remote provider.

Copyright 2022 Google LLC Page 30 of 39


Creating Metadata for a Tag
Once Metadata specification is defined in the Metadata bucket, we can associate it to tags (or types, so any tag created under
that type will inherit the bucket too). To do that, we just need to open the MC UI and go to the Cloud Tags section:

Copyright 2022 Google LLC Page 31 of 39


To associate a given bucket with a given tag, we select the Actions column and "View/Edit" the tag configuration:

If we open the Metadata section of the configuration, we can see that the list of available Metadata Buckets is displayed under
a selector. To associate the bucket to the tag, we just simply select it from the list:

Copyright 2022 Google LLC Page 32 of 39


All selected Metadata Buckets will be added to the section below and their corresponding schemas, which are now available
under the "Schemas" section. Each Metadata Bucket is color-coded and the schemas that belong to the same bucket share the
same color.

To create a Metadata Implementation we just need to click on the Schema that we need to fill-in and the content of the schema
will open below in a form format:

Now the user can just complete the form and click on the "Save" button. The tag will now be associated with the values entered
in the form. Those values are now available as metadata.

Receiving Metadata from the edge


It is possible to directly receive metadata from the edge additionally to using the MC UI. To process metadata from an edge
message the parser receiving those messages should map the metadata content to the "metadata" section of the type / tag.

The metadata section can specify a bucket ID, bucket version and schema so the metadata information is parsed to the correct
metadata bucket in the tag. If the bucket is not specified, the information will be mapped to a "default" bucket that has a
generic JSON schema. That information will appear in the tag section of the MC UI under the default bucket.

Copyright 2022 Google LLC Page 33 of 39


Moving Configurations Between Environments
The configurations stored in the config-manager can be exported and imported/merged into another environment. It can also
be used to regularly backup the configurations. The import of the configurations must be to the destination environment that
has 1.2.0+ version. The exported data are stored in files in a GCS bucket that can be used later to import to a destination
environment.

Configuration Bucket
Once you deploy MDE v1.2.0+, you’ll have a new GCS bucket created named as ${PROJECT_ID}-config-manager-jobs which
contains a couple of predefined directories that will be created once the import/exports jobs are submitted via the
config-manager – except for the import directory. The staging directory is private to the config-manager, and it’s mainly used
to track the progress of the submitted jobs in the background.

Once the job finishes successfully, it will contain the exported files in a data directory under the export folder as shown in the
below screenshot. Notice that each job will have its own UUID and output data directory which enables running the export jobs
multiple times over time without overwriting the old exported files.

Copyright 2022 Google LLC Page 34 of 39


To import configurations, you’ll need to place the exported data directory under the import folder as shown below. Notice that
the file names are fixed, and they can’t be changed.

Import/Export Jobs API


You can request an export job using the below REST endpoint. This creates a job in the background and returns the UUID for
that job, there can’t be two concurrent jobs active at the same time. The API will be able to submit a new job only when there’s
no other job active.

POST http://{{hostname}}:{{port}}/api/v1/admin/export
{
"status": 201,
"message": "Export job is submitted and it will start in the background",
"timestamp": "07-11-2022 01:58:37",
"id": "a26a50d4-e8a9-49f6-943d-175590636968"
}

To submit an import job, you need to use the corresponding import API, shown below. The same restrictions of having one job
at a time of import/export apply here too. The API accepts an overwrite flag which is defaulted to true, the overwrite behavior
will replace the existing configuration item if found in the database. If it is set to false, then the import job will skip it.

POST http://{{hostname}}:{{port}}/api/v1/admin/import?overwrite=true
{
"status": 201,
"message": "Import job is submitted and it will start in the background",
"timestamp": "07-11-2022 01:58:37",
"id": "a26a50d4-e8a9-49f6-943d-175501636969"
}

At any time, you can get the active jobs using the http://{{hostname}}:{{port}}/api/v1/admin/jobs/active. You can
also get the full history of submitted jobs along with their start/end times and the status of the jobs

Copyright 2022 Google LLC Page 35 of 39


GET http://{{hostname}}:{{port}}/api/v1/admin/jobs/history
[
{
"id": "b0988991-5651-419c-b0e9-3edc2f307bd8",
"type": "import/export",
"name": "export-job",
"status": "FINISHED_SUCCESS",
"startTime": "2022-11-04T13:24:40.522Z",
"endTime": "2022-11-04T13:27:01.478Z",
"active": false
},
{
"id": "6ceecbde-bb45-43be-8869-181dff2256bc",
"type": "import/export",
"name": "export-job",
"status": "DEACTIVATED",
"startTime": "2022-11-04T13:29:34.677Z",
"endTime": "2022-11-04T13:30:34.373Z",
"active": false
},
{
"id": "a26a50d4-e8a9-49f6-943d-175590636968",
"type": "import/export",
"name": "export-job",
"status": "FINISHED_SUCCESS",
"startTime": "2022-11-07T13:58:46.813Z",
"endTime": "2022-11-07T13:59:57.767Z",
"active": false
}
]

Lastly, you can deactivate any running job by using the deactivation API of
http://{{hostname}}:{{port}}/api/v1/admin/jobs/deactivate/{uuid} where uuid is the job UUID.

Integrating with Other Subsystems


This feature is currently under the early access program and your project must be allowlisted before enabling the below
features. Please contact the solution team for more information

Enabling Anomaly Detection


Starting 1.2.0, you can enable anomaly detection for a configuration type. This will enable all of the tags belonging to that type
to be checked for anomalous values. The anomaly detection can learn from a pre-existing history of tag values which is
automatically used once you enable the anomaly detection for a particular type. This provides better accuracy of the
anomalous values detection. In this release, the anomaly detection has predefined tuning parameters that can’t be changed.
Once the anomaly detection detects an anomaly, it will create a new tag that represents the trigger that an anomaly occurred.

To enable anomaly detection, you need to enable the alpha features by contacting the solution team. Once the alpha features
are enabled, you will see a new transformation in the MC user interface that represents the anomaly detection registration. You
can enable it for a given type as described above.

Copyright 2022 Google LLC Page 36 of 39


System Configurations
Most of the configurations of the config-manager are specified at deployment time in the helm chart. Some of these
configurations include the allowed page size when it retrieves a multi-page endpoint such as /tags and other includes the
endpoint of the anomaly detection subsystem that MDE should communicate with to perform remote configurations. In the
1.2+ release, MDE started to introduce a way to override those deployment-time configurations using the API as this could be
handy for overriding some features/configurations without the need to re-deploy the containers.

The current release offers two configuration endpoints to specify the anomaly detection subsystem endpoint and the page size
for the multi-page endpoints and the import/export batch frequency. There are two main APIs for this, one that completely
overwrites the configuration settings with what you provide in the REST endpoint body, and another one that only updates
portions of the configuration based on what you sent. They are described below

POST http://{{hostname}}:{{port}}/api/v1/configs/write

{
"anomalyDetection": {
"properties":{
"mde.subsystem.ad.config-endpoint": "https://anomaly-detection-subsystem.a.run.app"
}
}
}

This will wipe out any existing values of any other configurations and just add one configuration element for the
anomaly-detection. The response will display the total system configurations currently stored in MDE

The second endpoint will amend/patch the system configuration using the part that you’ll specify in the body. Here’s an
example

PATCH http://{{hostname}}:{{port}}/api/v1/configs/patch

{
"api": {
"properties":{
"mde.api.page.max-size": "100"
}
}
}

Assuming that the above two calls run in sequence, the final result of the system configurations will be as follows

GET http://{{hostname}}:{{port}}/api/v1/configs

{
"anomalyDetection": {
"id": "7d172502-aa2f-42aa-97af-f2dd27a829f4",
"properties": {
"mde.subsystem.ad.config-endpoint": "https://admin-arzpme4m3a-uc.a.run.app"
}
},

Copyright 2022 Google LLC Page 37 of 39


"api": {
"id": "18742493-d7d0-4fa7-9bd2-1050e89f660a",
"properties": {
"mde.api.page.max-size": "100"
}
}
}

Depending on the configuration you’re providing, there will be specific validation before saving it. For example, to add an
anomaly-detection endpoint, it must have a valid up-and-running anomaly-detection instance that can be contacted at the
configuration saving time

Monitoring and Troubleshooting


The solution provides two places to monitor the services and track the flow of the processed messages. The first place is a set
of technical metrics for the solution components such as the Config Manager, the Dataflows, and the time-series component.
Those are exposed to the GCP Project Monitoring Suite (f.k.a Stackdriver), and you can create custom dashboards using those
metrics. The metrics are exposed as custom metrics with the prefix of /sfp/ where you’ll find distribution metrics for latency,
and other count metrics for the number of requests and maximum latency. You’ll also find other metrics related to the
application database access performance and the connection pooling information for the CloudSQL backend.

In addition to the above technical metrics, the solution also exports a status of each incoming message as it passes through
the various processing steps of the solution. The export destination is a BigQuery table called OperationsDashboard which
has the following structure

Copyright 2022 Google LLC Page 38 of 39


- messageId is the technical UUID that we assign to each incoming message. This ID is unique for each message and
it’s the same across all the processing steps for a single message. The Ids are important if it is required to replay
certain messages as the solution uses them to find the raw messages that were received before processing.
- header is the messages that the solution received in the PubSub, those are normally technical headers set by the edge
system
- payload is the content of the message in the current processing step. It’s important to emphasize that the contents
could be different from the raw message content as they show the current content after the processing. As a corollary
to that, the first processing step will always contain the actual raw message that was received on PubSub.
- processingStep is the name of the processing step the solution is executing, you can check the configuration guide
for more details about the runtime steps of the solution
- status is the processing status of the step either success or failure
- statusReason will contain the failure message in case the processing failed for a particular message
- eventTimestamp is the raw event timestamp that was received on PubSub
- stepTimestamp is the timestamp at which the processing was done at this step which could be used to track any
latency in the processing time

You can create Data Studio dashboard that utilize this table, you can ask the solutions team to share a sample dashboard with
you

Copyright 2022 Google LLC Page 39 of 39

You might also like