Melt 101 Four Essential Telemetry Data Types
Melt 101 Four Essential Telemetry Data Types
MELT 101
An introduction to the four essential telemetry data types
Table of Contents
INTRODUCTION 03
PART 1: EVENTS 04
Limitations on events06
PART 2: METRICS 07
Limitations on metrics09
PART 3: LOGS 10
PART 4: TRACES 12
02
MELT 101: An introduction to the four essential telemetry data types
Introduction
Observability has transformed the world of Starting with a simple vending machine anal-
monitoring, and for good reason. Thanks to an ogy, this guide will walk you through an explana-
abundance of available tools, it’s easier than tion of metrics, events, logs, and traces, and will
ever to ship code, but that also means software demonstrate:
environments are more complex than they’ve
• How they differ from one another
ever been. As our software development prac-
tices have evolved, so have our systems. It’s no • When to use one versus another
longer enough to ask if something is wrong in our • How they’re used in New Relic One, the first
software stack; we must now also ask why. This observability platform
is the fundamental function of observability.
03
MELT 101: An introduction to the four essential telemetry data types
Part 1: Events
Fig. 1
Conceptually, an event can be defined as a dis- You can also attach multiple measurements as
crete action happening at a moment in time. So, to attributes to a single event (although at New Relic,
start with our vending machine analogy, we could a better way to report metrics would be to use the
define an event to capture the moment when metric telemetry type, explained in Part 2: Metrics).
someone makes a purchase from the machine:
At 3:34pm on 2/21/2019, a bag of BBQ chips was pur- How are events used?
chased for $1. Events are valuable, because you can use them to
See Fig. 1 above to view what event data could look confirm that a particular action occurred at a par-
like stored in a database. ticular time. For example, we may want to know the
last time our machine was refilled. Using events,
We could also define events for actions that do
we can look at the most recent timestamp from
not include a customer, such as when a vendor
the Refilled event type and answer this question
refilled the machine, or for states that are derived
immediately.
from other events, such as an item becoming “sold
out” after a purchase. Because events are basically a history of every
individual thing that happened in your system, you
You can choose which attributes are important
can roll them up into aggregates to answer more
to send when defining an event. There’s no hard-
advanced questions on the fly.
-and-fast rule about what data an event can con-
tain—you define an event as you see fit. In New Continuing our PurchaseEvent example from
Relic, for example, all events have at least a above, imagine that we had the following events
Timestamp and an EventType attribute. stored (see Fig. 2).
Fig. 2
04
MELT 101: An introduction to the four essential telemetry data types
With this data, we could answer practical questions Since we’ve deployed the New Relic Mobile agent,
like, “How much money have I made this week?” which captures crash data for any app it’s moni-
toring, we can access the raw underlying Mobile-
Because we have a history of every purchase event
Crash event data in New Relic.
stored, we can simply sum the Value column and
see that we’ve made $4.25. In the New Relic One chart builder, we’ll run the
following query:
Events become more powerful when you add
more metadata to them. For example, we could SELECT * FROM MobileCrash
add additional attributes, such as ItemCategory
Each row in the following table corresponds to a
and PaymentType, so we could run faceted queries
specific crash event that occurred for a particular
against our PurchaseEvent data (see Fig. 3).
user at some point.
Now we can ask questions such as:
Example: Using
events in New Relic Now, let’s say we wanted to ask more useful
In this example, let’s say we’re a telco and have questions about this data. For example, we might
multiple customers reporting crashes in our mobile want to know if our app was crashing more
application, “ACME Telco -Android”, and it’s time to often on a particular manufacturer’s devices
do some analysis. during the past day.
Fig. 3
05
MELT 101: An introduction to the four essential telemetry data types
Fig. 4
Here, we’d run the following query in chart builder: fill up even the largest databases. Or you could
instead take a sample of the temperature at a
SELECT count(*) FROM MobileCrash WHERE
regular interval. This kind of data is better stored
appName = 'Acme Telco -Android' FACET device-
as a metric.
Manufacturer SINCE 1 day AGO
Limitations on events
You may be thinking events sound awesome
(“Let’s collect one of everything that happens all
the time!”). Well, event collection comes with a
cost. Every event takes some amount of compu-
tational energy to collect and process. They also
take up space in your database—potentially lots of
space. So for relatively infrequent things, like
a purchase in a vending machine, events are
great, but we wouldn’t want to collect an event for
everything the vending machine does. For
example, let’s say that you want to keep a history
of the temperature in the vending machine. You
could store an event for every minuscule, sub-
degree shift in temperature, which would quickly
06
MELT 101: An introduction to the four essential telemetry data types
Part 2: Metrics
To put it simply, metrics are numeric measure- Notice that we’ve lost some detail here com-
ments. Metrics can include: pared to reporting event data. We no longer know
what the specific three purchases were, nor do
• A numeric status at a moment in time (like
we have access to their individual values (and
CPU % used)
this data cannot be recovered). However, this
• Aggregated measurements (like a count of approach requires significantly less storage but
events over a one-minute time, or a rate of still allows us to ask certain critical questions like,
events-per-minute) “What were my sales over a specific range of time?”
The types of metric aggregation are diverse At a practical level, this is the primary difference
(for example, average, total, minimum, maximum, between metrics and events.
sum-of-squares), but all metrics generally share
the following traits:
• A name
• A timestamp
Fig. 5
07
MELT 101: An introduction to the four essential telemetry data types
Metrics vs. events less granular than event data. Events are useful
when the data is relatively small or sporadic in
So, what are the pros and cons of metrics nature, or when you don’t know the specific aggre-
and events? gates you want to see ahead of time. And each
individual event is stored until it’s deleted. (Note
Events that New Relic does allow you to turn event data
into metric data.)
Pros
Example: Using
metrics in New Relic
• Include individual data points
Metrics
Pros
Cons
08
MELT 101: An introduction to the four essential telemetry data types
Limitations on metrics
You get a lot of information from metrics in a
really compact, cost-effective format. So, why
wouldn’t we use metrics all the time? Well, simply
put, metrics require careful decision-making. For
example, if you knew ahead of time you wanted
to know the 50th percentile (median) and the
95th percentile of the metric you’re capturing,
you could instrument that, collect it on all of your
aggregates, and then graph it. But let’s say you
wanted to know the 95th percentile for just the
data of a particular item in the vending machine.
You can’t calculate that after the fact; you
would need all the raw sample events to
do that. So, for metrics, you must decide
ahead of time about how you want to analyze the
data and set it up to support that analysis.
09
MELT 101: An introduction to the four essential telemetry data types
Part 3: Logs
It’s not a stretch to say that logs are the origi- Let’s consider our original vending machine event:
nal data type. In their most fundamental form,
At 3:34pm on 2/21/2019 a bag of BBQ chips was pur-
logs are essentially just lines of text a system
chased for $1.
produces when certain code blocks get exe-
cuted. Developers rely on them heavily in order to The corresponding log data might look like the
troubleshoot their code and to retroactively verify data in Fig. 6.
and interrogate the code’s execution. In fact, Log data is sometimes unstructured, and
logs are incredibly valuable for troubleshooting therefore hard to parse in a systematic way; how-
databases, caches, load balancers, or older ever, these days you’re more likely to encounter
proprietary systems that aren’t friendly to in- “structured log data” that is formatted specifi-
process instrumentation, to name a few. cally to be parsed by a machine. Structured log
Similar to events, log data is discrete—it’s not data makes it easier and faster to search the
aggregated—and can occur at irregular time data and derive events or metrics from the data.
intervals. Logs are also usually much more
granular than events. In fact, one event can
correlate to many log lines.
Fig. 6
10
MELT 101: An introduction to the four essential telemetry data types
For instance, if we changed the log line from: We now know exactly what went wrong: The user
entered an invalid code.
2/21/2019 15:34:03: Dispensing item ‘Tasty
BBQ Chips’
To:
Example: Logs in New Relic
New Relic Logs are extremely useful for trouble-
2/21/2019 15:34:03: { actionType: purchase
shooting errors as soon as they occur.
Completed, machineId: 2099, itemName: ‘Tasty
BBQ Chips’, itemValue: 1.00 } For example, in our “WebPortal” application, we
see an error message for an invalid character
We could now search logs for purchaseCompleted
exception:
and parse out the name and value of the item
on the fly.
Fig. 7
11
MELT 101: An introduction to the four essential telemetry data types
Part 4: Traces
Fig. 8
Traces—or more precisely, “distributed traces”—are This event tells us that an item was purchased
samples of causal chains of events (or transac- via credit card at a particular time, and it took 23
tions) between different components in a micro- seconds to complete the transaction. But what if
services ecosystem. And like events and logs, 23 seconds is too long? Was it our backend service,
traces are discrete and irregular in occurrence. the credit card company’s service, or the issuing
bank’s service slowing things down? Questions like
Let’s say our vending machine accepts cash and
this are exactly what traces are meant to address.
credit cards. If a user makes a purchase with a
credit card, the transaction has to flow through
the vending machine via a backend connec- How do traces work?
tion, contact the credit card company, and then Traces that are stitched together form special
contact the issuing bank. events called “spans”; spans help you track a
In monitoring the vending machine, we could causal chain through a microservices ecosystem
easily set up an event that looks something like for a single transaction. To accomplish this, each
Fig. 8. service passes correlation identifiers, known as
“trace context,” to each other; this trace context
is used to add attributes on the span.
Fig. 9
12
MELT 101: An introduction to the four essential telemetry data types
Fig. 10
13
MELT 101: An introduction to the four essential telemetry data types
In this particular example, our “WebPortal” appli- With New Relic One—the industry’s first observ-
cation has a page called purchase/confirmation. ability platform that is open, connected, and
jsp. This page calls the “Fulfillment Service,” which programmable—we’re redefining how you ask
calls the “Billing Service,” which calls the “Shipping why and what’s possible in observability. And it all
Service.” Each colored rectangle denotes how starts with MELT.
long a nested service call lasted; the longer the
To learn more about these data types and how
rectangle, the more time spent in that particu-
they’re used in New Relic, check out our data types
lar service.
documentation.
ask “Why?” Try New Relic One today and start building better,
more resilient software experiences. Visit newrelic.
It doesn’t matter if you’re just getting started
com/platform.
with observability or are a seasoned DevOps
pro—understanding the use cases for each MELT
data type is an essential part of building your
observability practice.
© Copyright 2020, New Relic, Inc. All rights reserved. All trademarks, trade names, service marks and logos referenced herein belong to their respective companies. 05.2020