0% found this document useful (0 votes)
112 views14 pages

Melt 101 Four Essential Telemetry Data Types

This document provides an introduction to the four essential telemetry data types: metrics, events, logs, and traces (MELT). It begins by explaining events and how they are used to record discrete actions or states at moments in time. An example using purchase events from a vending machine is described. The document then discusses metrics, logs, and traces; explaining the differences between them and providing examples of how each type is used by New Relic. The overall summary is to introduce the reader to the MELT telemetry types and how they differ from each other.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views14 pages

Melt 101 Four Essential Telemetry Data Types

This document provides an introduction to the four essential telemetry data types: metrics, events, logs, and traces (MELT). It begins by explaining events and how they are used to record discrete actions or states at moments in time. An example using purchase events from a vending machine is described. The document then discusses metrics, logs, and traces; explaining the differences between them and providing examples of how each type is used by New Relic. The overall summary is to introduce the reader to the MELT telemetry types and how they differ from each other.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

White Paper

MELT 101
An introduction to the four essential telemetry data types
Table of Contents

INTRODUCTION 03

PART 1: EVENTS 04

How are events used?04

Example: Using events in New Relic05

Limitations on events06

PART 2: METRICS 07

Metrics vs. events08

Example: Using metrics in New Relic08

Limitations on metrics09

PART 3: LOGS 10

When are logs useful?11

Example: Logs in New Relic11

PART 4: TRACES 12

How do traces work?12

When should you use traces?13

Example: distributed tracing in New Relic13

Redefine how you ask “Why?”14

02
MELT 101: An introduction to the four essential telemetry data types

Introduction

Observability has transformed the world of Starting with a simple vending machine anal-
monitoring, and for good reason. Thanks to an ogy, this guide will walk you through an explana-
abundance of available tools, it’s easier than tion of metrics, events, logs, and traces, and will
ever to ship code, but that also means software demonstrate:
environments are more complex than they’ve
• How they differ from one another
ever been. As our software development prac-
tices have evolved, so have our systems. It’s no • When to use one versus another
longer enough to ask if something is wrong in our • How they’re used in New Relic One, the first
software stack; we must now also ask why. This observability platform
is the fundamental function of observability.

To achieve observability, you need to instrument


Note that we start with events because they
everything and view all your telemetry data in one
are the most critical data type for observ-
place—and there are plenty of ongoing debates
ability. Events are distinct from logs—they are
about the best ways to do this. At New Relic, we
discrete, detailed records of significant points
believe that metrics, events, logs and traces
of analysis but provide a higher level of
(or MELT for short) are the essential data types
abstraction than the details provided by logs.
of observability. When we instrument every-
Alerts are events. Deployments are events. So
thing and use MELT to form a fundamental, work-
are transactions and errors. Events provide the
ing knowledge of connections—the relationships
ability to do fine-grained analysis in real time.
and dependencies within our system—as well
as its detailed performance and health, we’re
practicing observability.

If you’re just getting started with observability,


though, the full value of MELT might not be com-
pletely clear. There’s a good chance you’ve heard
these terms before, but can you confidently
describe the differences among them?

03
MELT 101: An introduction to the four essential telemetry data types

Part 1: Events

Timestamp EventType ItemPurchased Value

2/21/2019 PurchaseEvent BBQ chips 1.00

Fig. 1

Conceptually, an event can be defined as a dis- You can also attach multiple measurements as
crete action happening at a moment in time. So, to attributes to a single event (although at New Relic,
start with our vending machine analogy, we could a better way to report metrics would be to use the
define an event to capture the moment when metric telemetry type, explained in Part 2: Metrics).
someone makes a purchase from the machine:

At 3:34pm on 2/21/2019, a bag of BBQ chips was pur- How are events used?
chased for $1. Events are valuable, because you can use them to
See Fig. 1 above to view what event data could look confirm that a particular action occurred at a par-
like stored in a database. ticular time. For example, we may want to know the
last time our machine was refilled. Using events,
We could also define events for actions that do
we can look at the most recent timestamp from
not include a customer, such as when a vendor
the Refilled event type and answer this question
refilled the machine, or for states that are derived
immediately.
from other events, such as an item becoming “sold
out” after a purchase. Because events are basically a history of every
individual thing that happened in your system, you
You can choose which attributes are important
can roll them up into aggregates to answer more
to send when defining an event. There’s no hard-
advanced questions on the fly.
-and-fast rule about what data an event can con-
tain—you define an event as you see fit. In New Continuing our PurchaseEvent example from
Relic, for example, all events have at least a above, imagine that we had the following events
Timestamp and an EventType attribute. stored (see Fig. 2).

Timestamp EventType ItemPurchased Value

2/21/2019 15:34:00 PurchaseEvent BBQ chips 1.00

2/21/2019 16:37:00 PurchaseEvent Pretzels 1.00

2/22/2019 7:14:00 PurchaseEvent Sour cream chips 0.75

2/24/2019 11:52:00 PurchaseEvent Water 1.50

Fig. 2

04
MELT 101: An introduction to the four essential telemetry data types

With this data, we could answer practical questions Since we’ve deployed the New Relic Mobile agent,
like, “How much money have I made this week?” which captures crash data for any app it’s moni-
toring, we can access the raw underlying Mobile-
Because we have a history of every purchase event
Crash event data in New Relic.
stored, we can simply sum the Value column and
see that we’ve made $4.25. In the New Relic One chart builder, we’ll run the
following query:
Events become more powerful when you add
more metadata to them. For example, we could SELECT * FROM MobileCrash
add additional attributes, such as ItemCategory
Each row in the following table corresponds to a
and PaymentType, so we could run faceted queries
specific crash event that occurred for a particular
against our PurchaseEvent data (see Fig. 3).
user at some point.
Now we can ask questions such as:

• How much money did I make off of each item


category? (Snacks: $2.75, Drinks: $1.50)

• How often do people use different payment


types? (Cash: 3, CreditCard: 1).

• How much money did I make per day? (2/21:


$2.00, 2/22: $0.75, 2/23: $0, 2/24: $1.50)

Example: Using
events in New Relic Now, let’s say we wanted to ask more useful
In this example, let’s say we’re a telco and have questions about this data. For example, we might
multiple customers reporting crashes in our mobile want to know if our app was crashing more
application, “ACME Telco -Android”, and it’s time to often on a particular manufacturer’s devices
do some analysis. during the past day.

Timestamp EventType ItemPurchased ItemCategory Value PaymentType

2/21/2019 15:34:00 PurchaseEvent BBQ chips Snacks 1.00 Cash

2/21/2019 16:37:00 PurchaseEvent Pretzels Snacks 1.00 Cash

2/21/2019 7:14:00 PurchaseEvent Sour Snacks 0.75 CreditCard


cream chips

2/24/2019 11:52:00 PurchaseEvent Water Drinks 1.50 Cash

Fig. 3

05
MELT 101: An introduction to the four essential telemetry data types

Fig. 4

Here, we’d run the following query in chart builder: fill up even the largest databases. Or you could
instead take a sample of the temperature at a
SELECT count(*) FROM MobileCrash WHERE
regular interval. This kind of data is better stored
appName = 'Acme Telco -Android' FACET device-
as a metric.
Manufacturer SINCE 1 day AGO

From the results, we can see pretty clearly that


the particular application has failed almost three
times more often for Manufacturer A’s line of
devices in the past day (See Fig. 4).

Limitations on events
You may be thinking events sound awesome
(“Let’s collect one of everything that happens all
the time!”). Well, event collection comes with a
cost. Every event takes some amount of compu-
tational energy to collect and process. They also
take up space in your database—potentially lots of
space. So for relatively infrequent things, like
a purchase in a vending machine, events are
great, but we wouldn’t want to collect an event for
everything the vending machine does. For
example, let’s say that you want to keep a history
of the temperature in the vending machine. You
could store an event for every minuscule, sub-
degree shift in temperature, which would quickly

06
MELT 101: An introduction to the four essential telemetry data types

Part 2: Metrics

To put it simply, metrics are numeric measure- Notice that we’ve lost some detail here com-
ments. Metrics can include: pared to reporting event data. We no longer know
what the specific three purchases were, nor do
• A numeric status at a moment in time (like
we have access to their individual values (and
CPU % used)
this data cannot be recovered). However, this
• Aggregated measurements (like a count of approach requires significantly less storage but
events over a one-minute time, or a rate of still allows us to ask certain critical questions like,
events-per-minute) “What were my sales over a specific range of time?”
The types of metric aggregation are diverse At a practical level, this is the primary difference
(for example, average, total, minimum, maximum, between metrics and events.
sum-of-squares), but all metrics generally share
the following traits:

• A name

• A timestamp

• One or more numeric values

A specific example of a metric might look like this:

For the minute of 3:34-3:35pm on 2/21/2019, there


were three purchases totaling $2.75.

This metric would be represented in a database as


a single row of data (as shown in Fig. 5).

You’ll often see multiple values calculated in a


single row to represent different metrics that
share the same name, timestamp, and count;
in this case, we’re tracking both the Total
purchase value as well as the Average pur-
chase value.

Timestamp Count MetricName Total Average

2/21/2019 15:34:00 3 PurchaseValue 2.75 .92

Fig. 5

07
MELT 101: An introduction to the four essential telemetry data types

Metrics vs. events less granular than event data. Events are useful
when the data is relatively small or sporadic in
So, what are the pros and cons of metrics nature, or when you don’t know the specific aggre-
and events? gates you want to see ahead of time. And each
individual event is stored until it’s deleted. (Note
Events that New Relic does allow you to turn event data
into metric data.)

Pros
Example: Using
metrics in New Relic
• Include individual data points

• Allow you to ask whatever questions you


Alongside metrics gathered by New Relic agents,
want at any point
customers can also send metrics from open
• Can be computed on the fly source tools—such as Prometheus, Microm-
eter, and DropWizard—and the metrics data
Cons they tend to find helpful includes error rate,
response time, and throughput. In the screenshot
• Expensive to store high volumes of event data
below, we see a 12-hour window for an application
• May hit bandwidth constraints within the called “WebPortal.”
source system while collecting and sending
event data

• Can be time-consuming to query

Metrics

Pros

• Store significantly less data


Notice how all the lines are very jagged? This
• Take less time to compute roll-ups
implies a higher level of fidelity in the data.

Cons

• Require you to decide how to analyze the


data ahead of time

Metrics work well for large bodies of data or


data collected at regular intervals when you know
what you want to ask ahead of time, but they are

08
MELT 101: An introduction to the four essential telemetry data types

Now let’s look at another 12-hour window for the


same metrics, captured two weeks ago:

Notice how the lines have smoothed out? This is


because the metrics have been further aggre-
gated over time. When the data is fresh, this data
represents one-minute spans of time; however,
after a certain amount of time has passed, we
typically don’t need such high granularity. So, the
minute averages get rolled up into hour averages:
one data point per hour rather than 60, which
saves on storage but sacrifices some detail.

Limitations on metrics
You get a lot of information from metrics in a
really compact, cost-effective format. So, why
wouldn’t we use metrics all the time? Well, simply
put, metrics require careful decision-making. For
example, if you knew ahead of time you wanted
to know the 50th percentile (median) and the
95th percentile of the metric you’re capturing,
you could instrument that, collect it on all of your
aggregates, and then graph it. But let’s say you
wanted to know the 95th percentile for just the
data of a particular item in the vending machine.
You can’t calculate that after the fact; you
would need all the raw sample events to
do that. So, for metrics, you must decide
ahead of time about how you want to analyze the
data and set it up to support that analysis.

09
MELT 101: An introduction to the four essential telemetry data types

Part 3: Logs

It’s not a stretch to say that logs are the origi- Let’s consider our original vending machine event:
nal data type. In their most fundamental form,
At 3:34pm on 2/21/2019 a bag of BBQ chips was pur-
logs are essentially just lines of text a system
chased for $1.
produces when certain code blocks get exe-
cuted. Developers rely on them heavily in order to The corresponding log data might look like the
troubleshoot their code and to retroactively verify data in Fig. 6.
and interrogate the code’s execution. In fact, Log data is sometimes unstructured, and
logs are incredibly valuable for troubleshooting therefore hard to parse in a systematic way; how-
databases, caches, load balancers, or older ever, these days you’re more likely to encounter
proprietary systems that aren’t friendly to in- “structured log data” that is formatted specifi-
process instrumentation, to name a few. cally to be parsed by a machine. Structured log
Similar to events, log data is discrete—it’s not data makes it easier and faster to search the
aggregated—and can occur at irregular time data and derive events or metrics from the data.
intervals. Logs are also usually much more
granular than events. In fact, one event can
correlate to many log lines.

1 2/21/2019 15:33:14: User pressed the button ‘B’

2 2/21/2019 15:33:17: User pressed the button ‘4’

3 2/21/2019 15:33:17: ‘Tasty BBQ Chips’ were selected

4 2/21/2019 15:33:17: Prompted user to pay $1.00

5 2/21/2019 15:33:21: User inserted $0.25 remaining balance is $0.75

6 2/21/2019 15:33:33: User inserted $0.25 remaining balance is $0.50

7 2/21/2019 15:33:46: User inserted $0.25 remaining balance is $0.25

8 2/21/2019 15:34:01: User inserted $0.25 remaining balance is $0.00

9 2/21/2019 15:34:03: Dispensing item ‘Tasty BBQ Chips’

10 2/21/2019 15:34:03: Dispensing change: $0.00

Fig. 6

10
MELT 101: An introduction to the four essential telemetry data types

For instance, if we changed the log line from: We now know exactly what went wrong: The user
entered an invalid code.
2/21/2019 15:34:03: Dispensing item ‘Tasty
BBQ Chips’

To:
Example: Logs in New Relic
New Relic Logs are extremely useful for trouble-
2/21/2019 15:34:03: { actionType: purchase
shooting errors as soon as they occur.
Completed, machineId: 2099, itemName: ‘Tasty
BBQ Chips’, itemValue: 1.00 } For example, in our “WebPortal” application, we
see an error message for an invalid character
We could now search logs for purchaseCompleted
exception:
and parse out the name and value of the item
on the fly.

When are logs useful?


Logs are incredibly versatile and have many use
cases, and most software systems can emit log
data. The most common use case for logs is for
getting a detailed, play-by-play record of what
happened at a particular time.

Let’s say, for instance, that we have a Purchase-


Failed event that looks something like this:
From here, we can click See Logs, and New Relic
One presents us logs from that specific error
Timestamp Count
transaction:
2/21/2019 15:33:17 PurchaseFailedEvent

From this, we know that a purchase was attempted


and failed for some unforeseen reason at a partic-
ular time, but we don’t have any additional attri-
butes that give us insight as to why the purchase
failed. The logs, however, may tell us something like In this case, we see that a user passed an incorrect
the data in Fig. 7: username—they simply mistyped a character.

1 2/21/2019 15:33:14: User pressed the button ‘B’

2 2/21/2019 15:33:17: User pressed the button ‘9’

3 2/21/2019 15:33:17: ERROR: Invalid code ‘B9’ entered by user

4 2/21/2019 15:33:17: Failure to complete purchase, reverting to ready state

Fig. 7

11
MELT 101: An introduction to the four essential telemetry data types

Part 4: Traces

Timestamp EventType Duration

2/21/2019 15:34:00 CreditCardPurchase 3

Fig. 8

Traces—or more precisely, “distributed traces”—are This event tells us that an item was purchased
samples of causal chains of events (or transac- via credit card at a particular time, and it took 23
tions) between different components in a micro- seconds to complete the transaction. But what if
services ecosystem. And like events and logs, 23 seconds is too long? Was it our backend service,
traces are discrete and irregular in occurrence. the credit card company’s service, or the issuing
bank’s service slowing things down? Questions like
Let’s say our vending machine accepts cash and
this are exactly what traces are meant to address.
credit cards. If a user makes a purchase with a
credit card, the transaction has to flow through
the vending machine via a backend connec- How do traces work?
tion, contact the credit card company, and then Traces that are stitched together form special
contact the issuing bank. events called “spans”; spans help you track a
In monitoring the vending machine, we could causal chain through a microservices ecosystem
easily set up an event that looks something like for a single transaction. To accomplish this, each
Fig. 8. service passes correlation identifiers, known as
“trace context,” to each other; this trace context
is used to add attributes on the span.

Timestamp EventType TraceID SpanID ParentID ServiceID Value Duration

2/21/2019 Span 2ec68b32 aaa111 Vending 1.00 23


15:34:23 Machine

2/21/2019 Span 2ec68b32 bbb111 aaa111 Vending 1.00 18


15:34:22 Machine Backend

2/21/2019 Span 2ec68b32 ccc111 bbb111 Credit Card 0.75 15


15:34:20 Company

2/21/2019 Span 2ec68b32 ddd111 ccc111 Issuing 1.50 3


11:34:19 Bank

Fig. 9

12
MELT 101: An introduction to the four essential telemetry data types

So, an example of distributed trace composed of


the spans in our credit card transaction might
When should you
look like Fig. 9: use traces?
If we look at the Timestamp and Duration data, we Trace data is needed when you care about the rela-
can see that the slowest service in the transaction tionships between services/entities. If you only had
is the credit card company’s; it’s taking 12 of the 23 raw events for each service in isolation, you’d have
seconds—that’s more than half the time for this no way of reconstructing a single chain between
entire trace! services for a particular transaction.

Additionally, applications often call multiple other


applications depending on the task they’re trying
How’d we get 12 seconds? The span to con-
to accomplish; they also often process data in
tact the issuing bank is what we call a child
parallel, so the call-chain can be inconsistent
span, the span to contact the credit card
and timing can be unreliable for correlation. The
company is its parent. So if the bank request
only way to ensure a consistent call-chain is
took 3 seconds, and the credit card com-
to pass trace context between each service to
pany took 15 seconds, and we subtract
uniquely identify a single transaction through
the child from the parent, we see that it
the entire chain.
took 12 seconds to process the credit card
transaction—more than half the total time
of the trace. Example: Distributed
tracing in New Relic
New Relic One captures trace data via its distrib-
uted tracing feature (See Fig 10).

Fig. 10

13
MELT 101: An introduction to the four essential telemetry data types

In this particular example, our “WebPortal” appli- With New Relic One—the industry’s first observ-
cation has a page called purchase/confirmation. ability platform that is open, connected, and
jsp. This page calls the “Fulfillment Service,” which programmable—we’re redefining how you ask
calls the “Billing Service,” which calls the “Shipping why and what’s possible in observability. And it all
Service.” Each colored rectangle denotes how starts with MELT.
long a nested service call lasted; the longer the
To learn more about these data types and how
rectangle, the more time spent in that particu-
they’re used in New Relic, check out our data types
lar service.
documentation.

Redefine how you More perfect software

ask “Why?” Try New Relic One today and start building better,
more resilient software experiences. Visit newrelic.
It doesn’t matter if you’re just getting started
com/platform.
with observability or are a seasoned DevOps
pro—understanding the use cases for each MELT
data type is an essential part of building your
observability practice.

Once you understand these data types, you’ll bet-


ter understand how to work with an observability
platform such as New Relic One to connect your
telemetry data—be it open source or vendor-
specific—to understand relationships and make
sense of the data as it relates to your business.
When you can visualize dependencies and view
detailed telemetry data in real time, you can more
quickly and easily resolve system problems and
prevent those issues from occurring again in your
applications and infrastructure. This is how you
ensure reliability.

© Copyright 2020, New Relic, Inc. All rights reserved. All trademarks, trade names, service marks and logos referenced herein belong to their respective companies. 05.2020

You might also like