2021-10-21 Email Explained From First Principles
2021-10-21 Email Explained From First Principles
This article
and its code were
first published on 7 May 2021
and last modified
on 21 October 2021.
If you like the article,
please share it with your
friends on social media
or support me with a donation.
You can also
join the discussion on Reddit,
download the article as a PDF,
or use Google
Translate to read this article in your native language.
Preface
Even if you’re not interested in email, this article can teach you a lot about Internet protocols and IT security.
For example, it covers
Implicit and Explicit TLS;
password-based authentication mechanisms
with hash functions, replay attacks,
encryption mechanisms, and
channel bindings;
internationalized domain names with Punycode encoding,
Unicode normalization, case folding,
and homograph attacks;
transport security
with DANE and HSTS;
and end-to-end security with S/MIME and PGP.
If you haven’t done so already, read the article about the Internet first.
This article assumes that you’re familiar with the following
acronyms and the concepts behind them:
RFC, IP,
TCP, TLS,
DNS, and DNSSEC.
This article focuses on how modern email works, not on how you set up your own email infrastructure.
If you want to do that, Mail-in-a-
Box seems like a good place to start.
Impact
This article had the following impact in the email industry (beyond additional DNS records):
The mail client Mutt gained an option to
conceal the sender’s time zone
for more privacy.
Mail-in-a-Box added null MX records
for subdomains with address records.
Gandi.net no longer includes
the sender’s IP address in sent messages.
Terminology
Email, which also used to be written as e-mail, stands for electronic mail.
Since the term electronic mail applies to any mail that is transferred
electronically, it also encompasses
fax, SMS, and other systems.
For this reason, I use only the short form email in this article and always mean
the decentralized system
to transfer messages over the Internet as documented in numerous RFCs.
The term email doesn’t appear in the
original RFC,
and many RFCs just use mail or (Internet) message instead.
In ordinary language, email refers both to the system of standards
and
to individual messages transmitted via these standards.
While the English language would allow us to distinguish between the two usages
by
capitalizing the former but not the latter, I’ve never seen anyone doing this.
Even though I’m tempted to pioneer the proper use of grammar
here,
I’d rather save my artistic license for other things.
(Proper nouns refer to a single entity,
whereas common nouns refer to a class of
entities.
Only proper nouns are capitalized in English.
For example, Earth with a capital E refers to the planet we live on,
whereas earth with a
lowercase E refers to the soil in which plants grow.)
Note that this is in contrast to Internet,
which is commonly capitalized because there is
Concepts
Before diving into the technical aspects of email, let’s first look at email from the perspective of its users.
Message
The purpose of email is to send messages over the Internet.
A message is a recorded piece of information which is delivered asynchronously from
a sender to one or several recipients.
Asynchronous communication
means that a message can be consumed at an arbitrary point after it has been
produced,
rather than having to interact with the sender concurrently.
A message can be transmitted with a physical object, such as a letter,
or
with a physical signal, such as an acoustic or electromagnetic wave.
While humans have delivered messages in the form of objects for millennia
with couriers and pigeons,
it’s only since the invention of the optical telegraph
in the late 18th century and the invention of the electrical
telegraph
in the middle of the 19th century that we can signal arbitrary messages over long distances.
The fundamental principle of
communication stayed the same over all those years:
You can either start a new conversation or continue an existing one by replying to a previous
message.
Mailbox
A mailbox is a box for incoming mail (also called an inbox),
into which everyone can deposit messages but ideally only the intended recipient can
retrieve them.
In some countries, the privacy of such messages is legally protected by the
secrecy of correspondence.
Provider
There are three things that set email apart from the
traditional postal system,
which is sometimes also referred to as snail mail:
1. Email conveys digital data,
whereas a letter is a physical item.
The former is much more useful for further processing.
2. Email enables instant global delivery at a marginal cost of zero.
The only fee you pay is for your access to the Internet.
3. Mailboxes for email are provided and operated by companies,
which are called mailbox providers.
While you could operate your own server
since email is an open and decentralized system,
this is rarely done in practice for reasons we discuss later on.
It is estimated
that around half of the human population uses email,
with an average of 1.75 active accounts per user.
In the Western world,
the consumer market is dominated by Google
with their Gmail service, which has 1.5 billion active users.
In China, the biggest player is
Tencent QQ with 900 million active accounts.
Outlook by Microsoft
has 400 million active users,
which is followed by Yahoo! Mail with 225
million active users.
Apple’s iCloud has 850 million users,
but it’s not known how many of those use its email functionality.
Address
Email addresses
are used to identify the sender and the recipient(s) of a message.
They consist of a username followed by the @ symbol and a
domain name.
The domain name allows the sender to first determine
and then connect to the mail server of each recipient.
The username allows
the mail server to determine the mailbox to which a message should be delivered.
The hierarchical Domain Name System ensures that the domain
name is unique,
whereas the mailbox provider has to ensure that the name of each user is unique within its domain.
There doesn’t have to be a
one-to-one correspondence between addresses and mailboxes:
A mailbox can be identified by several addresses,
and an email sent to a single
address can be delivered to multiple mailboxes.
Display name
Email protocols accept an optional display name in most places where an email address is expected.
The format for this is Display Name
<[email protected]>
according to RFC 5322.
Mail clients display this name to the user as follows:
This feature seems totally benign, but, as we will see later on,
it has serious privacy and security implications.
The @ symbol
Normalization
Subaddressing
Go to the Accounts and Import tab of your settings and click on “Add another email address” under “Send mail as”.
Click on the button “Next Step” and you’re done. You can now select a different From address the next time you compose a message.
Alias address
An alias address
doesn’t have a mailbox associated with it but simply
forwards all incoming messages to one or several addresses.
The
forwarding is done by the incoming mail server of the alias address
and the expanded addresses may belong to the same or to different hosts.
Unlike in the case of a mailing list,
an automatic response by a recipient is sent to the original sender.
Alias addresses can forward messages
to other alias addresses, which can cause mail loops.
Mailing list
A mailing list
is an address which forwards incoming messages to all the subscribers of the list.
The administrator of the list can decide who is
allowed to send messages to the list
and whether each message needs to be approved by a moderator before it is forwarded.
Unlike in the
case of an alias address,
the mailing list software has to change the envelope of the message
so that automatic responses from subscribers of
the list
are sent to the administrator of the list rather than the original sender.
Address syntax
Common addresses
If you use your own domain for email, you can choose the local part of your addresses
however you want as long as you adhere to the address
syntax.
Some local parts, though, are commonly used to reach the person with a specific role in an organization:
Address Expectation
admin@ Reach the technical administrator (as an alternative to the previous three addresses).
Recipients
You can address the recipients of a message in three different ways:
The To field contains the address(es) of the primary recipient(s).
As a sender, you expect the primary recipient(s) to read and often to react to
your message.
The expected reaction can be a reply or that they perform the requested task.
The Cc field contains the address(es) of the secondary recipient(s).
As a sender, you want to keep the secondary recipient(s) informed
without
expecting them to read or react to your message.
(Cc stands for carbon copy.)
The Bcc field contains the address(es) of the hidden recipient(s).
Their address(es) are not to be revealed to other recipients of the message.
The field is usually fully preserved in your folder of sent messages
but fully removed in the version of the email that is delivered to others.
Alternatively, a different message could be delivered to each hidden recipient
where their address alone is listed in the Bcc field.
The standard
also allows hidden recipients to see each other;
they just have to be removed for the primary and secondary recipients.
The vague semantics of
this feature leads to several problems.
(Bcc stands for blind carbon copy.)
Group construct
Sender
There are two relevant fields to indicate the originator of a message:
The From field contains the address of the person who is responsible for the content of the message.
The Reply-To field indicates the address(es) to which replies should be sent.
If absent, replies are sent to the From address.
Important: The core email protocols do not authenticate the sender of an email.
It’s called spoofing
when the sender uses a From address which
doesn’t belong to them.
Forged sender addresses are a huge problem for the security of email.
There are additional standards to authenticate
emails.
For them to have the desired effect, though,
both the sender and the recipients have to use them.
Sender field
RFC 5322
differentiates between the author and the sender of a message.
The person who writes the message is usually also the one who
sends it.
If the author and the sender are different, though,
the sender should be provided in the Sender field.
The standard also allows
several addresses in the From field.
If this is the case, the email must include a Sender field with a single address.
However, I’m not aware of
any mail clients which support this.
In practice, the addresses of the co-authors are simply added to the Cc field.
Their contribution is made
No reply
Many emails are sent from automated systems, which cannot handle replies.
Examples of such emails are notifications about events on a
platform and reports about some usage statistics.
RFC 5322
required each email to have a From field with one or several addresses.
RFC
6854 updated the standard in 2013
to allow the group construct to be used in the From field as well.
This allows automated systems to
provide no reply address by using an empty group in the From field,
rather than having to rely on users interpreting an address such as no-
[email protected] correctly.
The automated system can still identify itself by choosing the name of the group appropriately,
for example
LinkedIn Notification Bot:;.
In the absence of an alternative to indicate the originating domain to the user,
I strongly advise against
using an empty group in the From field, though,
because this defeats all efforts towards domain authentication.
Even the RFC itself
recommends against
the general use of this method and says
that it is for limited use only.
Thus, we still have to wait for a usable
No-Reply
header field, unfortunately.
(The empty group construct is used to downgrade internationalized email addresses
as specified in RFC 6857.)
Subject
The Subject field identifies the topic of a message.
Its content is restricted to a single line but the line can be of arbitrary length.
(We’ll talk about
encoding later.)
RFC 5322 also defines other informational fields,
namely Comments and Keywords, but I’ve never seen them being used.
All
informational fields are optional, which means an email doesn’t need a subject line.
The mail clients I’ve checked, though, include the Subject field
even when it’s empty.
While the message is transmitted with an empty Subject field,
mail clients usually display “(No subject)” instead of nothing.
Prefixes
When you reply to a message, your mail client automatically suggests the new subject:
“Re: ” followed by the original subject.
While I would
argue that “Re” stands for “reply”,
RFC 5322 says
that it is an abbreviation of the Latin “in re”, which means “in the matter of”.
Similarly, if you
forward an email to another recipient,
your mail client typically puts “Fwd: ” in front of the original subject.
Using such prefixes in replies and
forwarded emails is optional.
In particular, they have no technical significance.
As we will see later, messages are grouped into conversations
based on other, more reliable information.
Body
Last but not least, an email has a body
(which is strictly speaking optional).
The body contains the actual content of a message.
It can be formatted
in different ways and can consist of different parts.
Splitting the body into several parts is useful,
for example, to send a plaintext version alongside
an HTML-encoded message
or to attach files to an email.
We’ll discuss later how all of this works.
Size limit
Architecture
There are four separate aspects to understand email from a technical perspective:
Format: What is the syntax of email messages?
Protocols: How are these messages transmitted?
Entities: Who transmits these messages to whom?
Architecture: How are these entities arranged?
Simplified architecture
SMTP for
Incoming Outgoing message relay Incoming Outgoing
mail server mail server mail server mail server
of sender of sender of recipient of recipient
IMAP for
message Mail client Mail client
storage of sender of recipient
Standardization
If we ignore for a moment that there are separate servers for incoming and for outgoing mail,
we’re left with the following:
The user interacts
with a client to read and compose messages.
The client submits the composed messages to a server for delivery.
The client also fetches newly
received messages from the server.
The server connects to other servers in order to deliver some messages.
The important thing to note is
that the interactions between these entities are independent from one another:
Sender Recipient
Server Server
Client Client
User User
User ➞ client:
How users interact with mail clients is not standardized.
In particular, users don’t have to sit directly in front of their mail
client.
They can also interact with a mail client over the Web, for example.
Some standards demand that certain actions have to be
confirmed or initiated by the user.
Apart from this, mail clients are free to present information to the user in any way they want.
But
similar to how you can drive a car from any brand if you know how to drive a car from one brand,
users have developed expectations
regarding how the above concepts are presented.
For example, Cc is always called Cc.
Webmail
&
Web server + = Web server
In the case of webmail, the mail client is accessed via a web server using a web browser.
Web Mail
On the left, the code to interact with the data comes from the server.
On the right, the logic is inside the client and only data is exchanged.
Official architecture
For the sake of completeness and to enable you to understand the linked articles,
this subsection covers the official terminology as used, for
example, in RFC 5598.
In the official documents, there are five instead of three entities,
with each of them having a more complicated name and,
of course, an associated
three-letter acronym (TLA):
MUA Mail user agent Client to compose, send, receive, and read emails, such as
MSA
Mail submission agent
Server to receive outgoing emails from authenticated users
MSS Mail submission server and to queue them for delivery by the mail transfer agent (MTA).
MTA Mail transfer agent Server to deliver the queued emails and to receive them on the other end.
It then forwards the received emails to the mail delivery agent (MDA).
MDA Mail delivery agent Server to receive emails from the local mail transfer agent (MTA)
MS
Message store
Server to store the emails received from the mail delivery agent (MDA)
MAS Mail access server and to deliver them to the mail user agent (MUA) of the recipient.
These terms are not as precise as they seem to be and the boundaries are often fluid in practice.
Having more entities also changes the
architecture.
What follows is a nicer version of this ASCII graphic,
which is a masterpiece to be appreciated in its own right.
Sender Recipient
Entities
There are three entities in the simplified architecture:
the mail client,
the outgoing mail server,
and the incoming mail server.
Mail client
The mail client is a computer program to compose, send, retrieve, and read emails.
It provides the interface through which users handle email.
The
mail client runs either locally on the user’s device or remotely on a web server.
Examples of the former kind are
Microsoft Outlook,
Apple Mail,
and
Mozilla Thunderbird.
Examples of the latter are Google Gmail and
Yahoo! Mail when accessed through a web browser.
(Both companies also
provide mobile
apps for Android
and iOS, which fall into the former category.)
The mail client connects to the outgoing mail server to submit messages for delivery to other users
and to the incoming mail server to fetch new
messages from the user’s mailbox.
Both servers authenticate the user, typically with a username and a password.
The mail client connects to the
incoming mail server through a different interface than outgoing mail servers do,
which can be seen on the recipient’s side of the simplified mail
architecture:
IMAP for
message Mail client Mail client
storage of sender of recipient
Configuration
The simplified email architecture corresponds to what mail clients like Apple Mail display to you.
The domain of the address (ef1p.com) is different from the domain of the servers (mail.gandi.net).
The host names of the incoming mail server and the outgoing mail server are usually not the same.
Custom domains
Autoconfiguration
You can check the email service records of a domain with the following tool,
which uses an API by Google for its DNS queries:
1. Service records make the incoming and outgoing mail servers publicly known.
For public mail services, where anyone can create an
account, this is the case anyway.
For private mail services, on the other hand,
such knowledge makes attacks on the infrastructure easier
if the mail servers cannot be guessed otherwise.
Insert the appropriate Domain and use 0 0 0 . for all the services
which are not supported by your mailbox provider.
Configuration database
At this point, you may be wondering how mail clients can often figure out the correct configuration by themselves
despite the lack of an
established standard.
Most mail clients look up the configuration for popular mailbox providers
in a database, which is either delivered with
the client or centrally hosted by the software manufacturer.
Some mail clients also use custom autoconfiguration protocols,
which typically
fetch an XML file hosted at a specific subdomain via HTTPS.
3. Check https://{Domain}/.well-known/autoconfig/mail/config-v1.1.xml.
The key difference between this and the previous
lookup is that
the autoconfig subdomain in step 2 can point to a web server operated by your mailbox provider,
while the lookup in the
current step must be handled by the Domain itself.
4. Look for a configuration file in the central database at https://autoconfig.thunderbird.net/v1.1/{Domain}.
5. Look up the MX record of the domain in the Domain Name System
and then check whether the central database has an entry for the so-
called apex domain
at the root of the zone.
This is useful for custom domains like ef1p.com,
which has an MX record pointing to
spool.mail.gandi.net,
which belongs to the zone starting at gandi.net.
The central database has an entry for gandi.net,
which is how
Thunderbird would find the configuration for my email address.
6. If all previous attempts to find a configuration failed,
Thunderbird resorts to guessing the mail servers.
It tries to connect to common
server names such as mail.{Domain}, smtp.{Domain}, and imap.{Domain}
on the default port numbers and checks whether they
support TLS or STARTTLS
and the challenge-response authentication mechanism (CRAM).
The last check prevents Thunderbird from
accidentally revealing the user’s password to the wrong server.
Unfortunately, CRAM is rather weak.
The far better salted challenge-
response authentication mechanism (SCRAM) should be used instead.
7. If all of the above steps fail, the user has to enter the configuration themself.
Why do we need outgoing mail servers when mail clients could simply deliver the messages directly?
SMTP for
Incoming Outgoing message relay Incoming Outgoing
mail server mail server mail server mail server
of sender of sender of recipient of recipient
SMTP for
SMTP for direct IMAP or POP3
message message for message
submission delivery? retrieval
IMAP for
message Mail client Mail client
storage of sender of recipient
Since outgoing mail servers are just a piece of software and can thus be integrated into mail clients,
it is technically possible to send emails
directly to the incoming mail server of each recipient.
In fact, sending an email to someone from the command line
is my favorite
demonstration in the seminars I give.
Only badly configured incoming mail servers accept such messages, though.
Address reputation:
Incoming mail servers learn the sources of legitimate email over time.
Messages coming from such sources are
likely to be delivered to the user’s inbox.
Messages from sources with a bad reputation are often dropped on arrival.
Messages from
unknown sources are either dropped or put into the user’s spam folder.
Reputation
is crucial to build trust among unverified
participants.
Even when the sender of an email is authenticated,
reputation remains at the core of any effort to fight spam.
As we will
see later on,
you have to buy into the reputation of others
if you want to have your emails delivered reliably to your customers.
A whole
industry has developed around this value proposition.
Since building a reputation as a trustworthy email sender yourself
is too much of
a struggle for most Internet users and companies,
the port restriction mentioned in the previous bullet point isn’t much of a problem in
practice.
User authentication:
Mailbox providers are incentivized to protect their reputation
because users would no longer use their service if
emails are no longer delivered reliably.
This is why mailbox providers impose sending limits on their users
and delete accounts when
misbehavior is reported to them,
which is possible only if they authenticate their users before relaying messages.
For example, Gmail
limits
the number of messages per day to 2’000 and the number of recipients per message to 100
if the message is submitted from a
mail client rather than the web interface.
Vouching for users could also be done differently,
for example by delegating trust
to mail
clients with digital signatures.
However, a mailbox provider could no longer rate limit and filter outgoing messages
if mail clients
delivered them directly.
Domain authentication:
When it comes to information security,
trust is good but control is better.
Spam is a problem of quantity:
You
simply want to bring the volume of unsolicited messages to a bearable level.
Phishing, on the other hand, is a problem of quality:
A
single successful attack can cause a lot of damage.
A reputation system is great for fighting spam but not good enough for fighting
phishing.
The email delivery protocol itself doesn’t prevent the sender
from putting an arbitrary address into the From field.
In the
absence of a mechanism to authenticate the sender,
you can only hope that email servers with a good reputation don’t misuse their
reputation
and send messages with spoofed sender addresses and malicious content to you.
The idea behind domain authentication is
that each domain owner can specify
which outgoing mail servers are allowed to send messages from their domain.
Incoming mail
servers can then verify
whether the sender of a message is indeed authorized to send messages from the claimed domain.
In
combination with user authentication,
where outgoing mail servers prevent their users from sending messages in the name of another
user at the same domain,
the two mechanisms guarantee that the sender of a message owns the claimed From address.
There would be
other ways to achieve a similar result without requiring outgoing mail servers,
but this is how email works.
Domain
Incoming Outgoing authentication Incoming Outgoing
mail server mail server mail server mail server
of sender of sender of recipient of recipient
User User
authen- authen-
tication tication
User
authen- Mail client Mail client
tication of sender of recipient
The incoming mail server verifies that the outgoing mail server is authorized to send messages from the claimed domain,
while the outgoing mail server of the sender ensures that each user uses their own address in the From field.
How to avoid submitting the same message to both the outgoing mail server and the incoming mail server?
IMAP for
message Mail client Mail client
storage of sender of recipient
The mail client submits the same message to both the outgoing mail server and the incoming mail server.
1. Submit
Mail client
Gmail:
Google’s outgoing mail server automatically stores a copy of sent messages in the user’s sent folder.
In order not to end up with
duplicates in the sent folder,
the mail client shouldn’t store sent messages in the user’s mailbox.
Since the mail client
cannot detect
this
non-standard behavior when submitting a message to the outgoing mail server,
either the mail client has to treat @gmail.com addresses
differently
or the user has to disable the option to save a copy in the sent folder manually.
Since mail clients remove the Bcc field before
submission,
Gmail recovers it from the envelope of the message.
1. Submit
Mail client
Courier-IMAP:
The Courier Mail Server
has a configuration option to designate a mailbox folder as a special outbox folder.
When the mail
client stores a message in this folder,
the server sends the message to the addresses listed in the To, Cc and Bcc fields.
What makes this
approach interesting is that a mail client can use IMAP for everything
and no longer needs to support SMTP.
Unfortunately, this feature is
also not standardized
and mail clients can therefore not rely on its availability.
1. Store
Mail client
1. Store 2. Reference
Mail client
The incoming mail server waits for connections from mail clients on a different interface.
In order to access the mailbox of its user, the mail client
has to present appropriate
credentials.
The user’s email address and password are often used to authenticate the client,
which is granted
unlimited access to the mailbox on success.
If the incoming mail server supports OAuth,
the mail client can present an access token
to gain
potentially limited access to the user’s mailbox.
The scopes offered by Gmail
are an example of what limited access can look like.
While restricted
authorization is common for other services, it’s not yet the norm for email.
Once the client is authenticated, it can retrieve, deposit, and delete
messages.
It can also mark them as read or flag them for later attention.
Address resolution
How do outgoing mail servers find the incoming mail server of a recipient?
As we learned above, an email address consists of a username and
a domain name, separated by the @ symbol.
A sender finds the incoming mail server of a recipient
by querying the Domain Name System
(DNS)
for mail exchange (MX) records of the used domain name.
If no such records exist, the sender queries for address records
(A or AAAA) of
the domain name instead.
If the DNS response is not authenticated with DNSSEC,
mail might be sent to the server of an attacker.
TLS can
prevent this only
if the sender requires that the recipient’s domain is included in the
server certificate,
which is usually not the case.
A
standard for securing MX records with TLS exists, though.
Null MX record
Dotless domains
Name collisions
Protocols
The above entities communicate with two kinds of protocols:
They use delivery protocols to deliver messages
and access protocols to access the
user’s mailbox.
As discussed earlier,
only SMTP for message relay is mandatory.
All other protocols can be replaced in a proprietary setup.
For
example, there are efforts
to combine message submission
and mailbox access
in a standardized way.
IMAP for
message Mail client Mail client
storage of sender of recipient
Use of TLS
Historically, SMTP, POP3, and
IMAP ran directly on top of the transport layer
using the Transmission Control Protocol (TCP),
which means that
the communication was neither encrypted nor authenticated.
Anyone with access to one of the networks through which the communication was
routed
could therefore read and potentially alter your messages.
Even your user password might have been transmitted in the clear.
In theory, the
solution is straightforward:
Use Transport Layer Security (TLS)
to encrypt and authenticate the communication between each pair of entities.
In
practice, however, you want to be backward compatible:
A server that expects requests to be in a specific format cannot suddenly handle a
request for a TLS handshake.
There are two ways around this problem:
Implicit TLS:
Introduce a new port number for each service on which the communication starts directly with a TLS handshake.
The protocol
variant which uses TLS implicitly is denoted by appending an S to its name.
For example, IMAP becomes IMAPS.
Since the ease of deployment should trump any other concerns when it comes to security,
RFC 8314 recommends Implicit TLS over Explicit
TLS
for IMAP, POP3, and SMTP for message submission since 2018.
When used opportunistically,
Implicit TLS and Explicit TLS provide
security only against
passive attacks,
where an attacker can merely eavesdrop on your communication but cannot interfere with it.
In the
presence of an active adversary,
who can modify and drop network packets,
neither Explicit TLS nor Implicit TLS are secure
unless the client
has a trusted way to know that the server supports TLS.
In the case of Implicit TLS, the attacker just has to drop the client’s communication to
the new port,
which forces the client to connect to the old port using the insecure protocol in order to remain backward compatible.
In the
case of Explicit TLS,
the server lists TLS among its capabilities while the communication is not yet authenticated.
The attacker can simply
strip TLS from the server’s capabilities,
which leaves the client with no other option than to continue in plaintext.
Alternatively, a client can
sacrifice compatibility and refuse to exchange messages over an insecure channel.
However, such a change is difficult to introduce because
users hate it when their setup no longer works.
It is therefore better if the client has a trusted way to know whether the server supports TLS.
The following three methods are used in practice to inform the client:
Authenticated channel:
While the server cannot reliably inform clients about its capabilities over a downgradeable protocol,
it can use
another, already authenticated protocol,
such as DNSSEC,
to convey this information to them.
User configuration:
Last but not least, the user can configure the client according to some documentation,
which has to be trustworthy, of
course.
The server’s capability might be printed on a leaflet or mentioned on a website secured with HTTPS.
Mail client Name for Implicit TLS Name for Explicit TLS
Mail clients often use other names for Implicit TLS and Explicit TLS.
Anyhow, we can just hope that mail clients refuse insecure connections when the appropriate TLS option is enabled.
I assume this is the case
but having the actual behavior documented would still be nice.
For example, Apple Mail has an option to allow insecure authentication under
“Advanced IMAP Settings”,
which doesn’t disable the “Use TLS/SSL” checkbox as seen below.
The documentation says:
“For accounts that
don’t support secure authentication,
let Mail use a non-encrypted version of your user name and password to connect to the mail server.”
What does this mean?
Are they talking about CRAM (challenge-response authentication mechanism),
which uses a hash function and not
encryption,
or does this option make TLS opportunistic? 🤷♂️
Historically, your web browser used the HyperText Transfer Protocol (HTTP)
to fetch websites and other resources from web servers.
Just
like the original email protocols, HTTP runs directly on top of TCP,
which means that its communication is neither encrypted nor
authenticated.
Since anyone on your network can read the transmitted messages
and hijack your session,
HTTP should no longer be used.
Also similar to the email protocols, there is a variant of HTTP called HTTPS,
which uses Implicit TLS to protect your communication.
In order
to remain backward compatible, HTTPS has to use a different port.
While the default port for HTTP is 80, the default port for HTTPS is 443.
What is less well known because it’s rarely used, is that HTTP supports Explicit TLS as well.
Since version 1.1, HTTP has an Upgrade header
field
to upgrade an insecure connection to a secure one.
Because Explicit TLS maintains backward compatibility,
it can be offered on port 80
as documented in RFC 2817.
Deployment statistics
Port numbers
Every protocol specifies a default port on which servers listen for incoming requests.
Instead of scattering the port numbers used by various email
protocols throughout the following subsections,
here is a table with all the relevant information for future reference:
ManageSieve – 4190
Why does SMTP for Relay have no port for Implicit TLS?
First of all, we’ll talk in a minute about why SMTP is different for submission and relay.
The official argument
for why SMTP for Relay has no
port for Implicit TLS
is that MX records have no way to indicate which port to use and thus port 25 has to be used.
In my opinion, this
argumentation is misleading.
A more accurate answer is that the outgoing mail server had no secure way to discover
whether an incoming
mail server supported TLS back then,
so opportunistic security was all one could hope for at the time.
(Manual configuration isn’t an option
for relay and DNSSEC was standardized only in 2005 and deployed in 2010.)
Since opportunistic TLS is more easily accomplished with
Explicit TLS rather than with Implicit TLS,
we’re stuck with Explicit TLS for message relay to this day,
even though incoming mail servers can
now indicate their TLS capability
in a secure way.
Delivery protocols
The Simple Mail Transfer Protocol (SMTP) is used for two different purposes:
The mail client uses it to submit a message to the outgoing mail
server of its user,
while the outgoing mail server uses it to relay the message to the incoming mail servers of the recipients.
Originally, though,
mail servers relayed messages from anyone to anyone.
This is called open mail relay.
In particular, there was no distinction between outgoing
mail servers and incoming mail servers.
There were just mail transfer agents, which relayed messages among them.
Mail clients connected to
mail transfer agents just like other mail transfer agents did
and asked them to deliver a given message for them.
This approach had two
problems:
Abuse by spammers:
By routing their mail through relay servers of reputable organizations,
spammers made it difficult to block their
messages based on their origin.
Additionally, a single message to a relay server could have a large number of recipients,
which allowed
spammers to exploit the still costly bandwidth of others.
However, this also meant that a large number of spam messages were identical,
which made them relatively easy to filter out on the receiving side.
Unwanted rewriting:
Emails have to be in a certain format and mail servers started rewriting them
so that they adhere to the standard as
well as to organization-specific policies.
However, relay servers are not supposed to modify messages
and apparently such modifications
caused more harm than good.
We’ll have a closer look at the format of messages in the next chapter,
but since we already want to transmit messages in this section,
we
have to cover the basics now.
A message consists of several header fields
and an optional body,
which follows after an empty line.
Each
header field has to be on a separate line but can,
if necessary, span several lines.
Identical to HTTP,
header fields are formatted as Name:
Value.
What follows is a simple example message.
You can find more examples in RFC 5322.
Message-ID: <[email protected]>
Hello Bob,
Best regards,
Alice
Submission: The mail client of Alice removes the Bcc header field from the message
and submits the message with all recipient addresses in the envelope,
including the ones of Bcc recipients, to the outgoing mail server.
Automatic responses shall be sent to the mailbox of Alice.
Envelope
MAIL FROM:<[email protected]>
RCPT TO:<[email protected]>
Outgoing RCPT TO:<[email protected]> Incoming
mail server mail server
of example.org Message of example.com
From: Alice <[email protected]>
To: Bob <[email protected]>
Cc: Carol <[email protected]>
Envelope
MAIL FROM:<[email protected]>
RCPT TO:<[email protected]>
Incoming Incoming
mail server mail server
Message
of example.com of example.net
From: Alice <[email protected]>
To: Bob <[email protected]>
Cc: Carol <[email protected]>
Envelope
MAIL FROM:<[email protected]>
RCPT TO:<[email protected]>
Outgoing Incoming
mail server mail server
Message
of example.org of ietf.org
From: Alice <[email protected]>
To: Bob <[email protected]>
Cc: Carol <[email protected]>
Second relay: The outgoing mail server of Alice also has to deliver the message to the recipient [email protected],
so it does that.
Is removing the Bcc field the job of the mail client or the job of the outgoing mail server?
The relevant standards are silent on this but experts
agree
that the software which constructs the envelope from the message is responsible for this.
If SMTP is used for submitting the message
to the outgoing mail server
(rather than using one of the custom approaches),
the mail client has to remove the Bcc field for the primary (To)
and secondary (Cc) recipients.
Since this is not clearly stated in the standard,
there existed (and maybe still exist) mail clients
which relied on
the outgoing mail server to remove the Bcc field.
However, RFC 6409 lists Bcc removal
neither among the mandatory actions nor among the
permitted message modifications for outgoing mail servers.
While some outgoing mail server software, such as Postfix,
which is deployed on
around 34% of the reachable mail servers on the Internet,
drop the Bcc header field by default,
others, such as Exim,
which is deployed on
around 57% of the reachable mail servers on the Internet,
do so only if they are invoked with the
-t option.
(This option was introduced for
use in pipelines,
such as cat message | sendmail -t.)
As a result, users could end up with the list of Bcc recipients going through to non-
Bcc recipients
depending on their specific combination of mail client and outgoing mail server software.
Since neither mail clients nor
outgoing mail servers document how they treat Bcc recipients,
you have to send a test email to figure out the behavior of your particular
setup.
Message-ID: <[email protected]>
Hi Bob,
Alice
Complete removal: The mail client removes the Bcc field from the message
and delivers the message with a single envelope for all
recipients to the outgoing mail server.
We already encountered this behavior in the previous box.
As far as I can tell, this is by far the most
common behavior in practice.
The Bcc field is removed from the message for all recipients.
Grouped delivery: The mail client splits the recipients into two groups.
The non-Bcc recipients get the message in which the Bcc field is
removed,
while the Bcc recipients get the original message,
in which all Bcc recipients are listed.
Envelope
MAIL FROM:<[email protected]>
RCPT TO:<[email protected]> Outgoing
Mail client of
mail server
[email protected]
Message of example.org
From: Alice <[email protected]>
To: Bob <[email protected]>
Envelope
MAIL FROM:<[email protected]>
RCPT TO:<[email protected]>
RCPT TO:<[email protected]>
Outgoing
Mail client of
mail server
[email protected] Message
of example.org
From: Alice <[email protected]>
To: Bob <[email protected]>
Bcc: Carol <[email protected]>,
Dave <[email protected]>
Individual delivery: While all non-Bcc recipients receive the same message,
each Bcc recipient receives a separate version of the message,
in which only they are listed as a Bcc recipient.
Just like the first approach,
this prevents Bcc recipients from learning about any other Bcc
recipient.
Envelope
MAIL FROM:<[email protected]>
RCPT TO:<[email protected]> Outgoing
Mail client of
mail server
[email protected]
Message of example.org
From: Alice <[email protected]>
To: Bob <[email protected]>
Envelope
MAIL FROM:<[email protected]>
RCPT TO:<[email protected]>
Outgoing
Mail client of
mail server
[email protected] Message
of example.org
From: Alice <[email protected]>
To: Bob <[email protected]>
Bcc: Dave <[email protected]>
Empty field: While the standard requires that Bcc recipients are never disclosed to non-Bcc recipients,
it allows the sender to indicate
with an empty Bcc field that there were hidden Bcc recipients.
Such a hint can be provided in any of the other three approaches.
Therefore, this is more of a second dimension rather than a fourth option,
increasing the overall number of Bcc possibilities to 3 × 2 = 6.
Envelope
MAIL FROM:<[email protected]>
RCPT TO:<[email protected]>
RCPT TO:<[email protected]>
Outgoing
Mail client of RCPT TO:<[email protected]>
mail server
[email protected]
of example.org
Message
From: Alice <[email protected]>
To: Bob <[email protected]>
Bcc:
An empty Bcc field indicates that there were hidden recipients without disclosing them.
The only way to be sure that a Bcc recipient won’t reply to all recipients by accident is to
first send the message to the non-Bcc recipients and
then forward the message to the hidden recipients.
If the hidden recipients don’t need to be hidden from each other,
you can list them in the
To field of the forwarded email.
Otherwise, keep them in the Bcc field.
Sometimes, the Bcc field is simply used to prevent certain recipients from getting replies
rather than to hide them from other recipients.
An
example of this is when you move the person who introduced you to someone else to Bcc
while still thanking them for the introduction in the
reply.
This use case could also be addressed with a Do-Not-Reply-To field,
which lists all addresses that should be skipped in a reply.
Such a
header field would also solve the no-reply problem.
However, it’s almost impossible to bring innovation to email
because first
implementations and then users would have to adopt such a change.
How does Gmail recover the Bcc header field of sent messages?
(Open connection)
220 server.example.com
HELO client.example.org
250 server.example.com
MAIL FROM:<[email protected]>
250 Ok
RCPT TO:<[email protected]>
250 Ok
DATA
354 Go ahead
(Actual message)
250 Ok
QUIT
221 Bye
(Close connection)
Command syntax
The first question that came to your mind after reading the above
sequence diagram
probably was: Is HELO a typo?
No, it’s not.
SMTP
commands simply consist of four characters.
They are almost always written in uppercase,
even though they are case insensitive.
But yes,
HELO does stand for “hello”.
The purpose of this command is for the client to identify itself to the server with a domain name or an IP address.
The identity provided by the client is relevant only in rare circumstances.
Why are the MAIL FROM and RCPT TO commands longer than four characters, then?
They’re not.
The commands are just MAIL and RCPT.
FROM
and TO denote the subsequent parameter value.
Some ESMTP extensions define additional parameters for the MAIL command.
The name and
value of these additional parameters are separated by an equals sign rather than a colon, though.
Field terminology
ESMTP tool
Message-ID: <[email protected]>
Hello,
Alice
.
250 2.0.0 Ok
QUIT
221 2.0.0 Bye
Tool explanations
Command-line interface
Clipboard verification
# macOS:
$ watch -n 1 pbpaste
# Linux:
$ watch -n 1 xclip -selection clipboard -o
# Windows:
$ powershell -command "while (1) { Clear; Get-Clipboard; Sleep 1 }"
$ openssl version
LibreSSL 2.8.3
The difference between ESMTP and SMTP is that ESMTP allows the server to list extended capabilities,
which the client can make use of
during the session.
Let’s have a look at some common SMTP extensions on the basis of what Gmail supports:
A transcript of a session with the outgoing mail server of Gmail when using Implicit TLS.
[Brackets indicate redacted information.]
Backward compatibility
STARTTLS extension
The STARTTLS extension is listed when we connect without TLS to Gmail’s outgoing mail server.
User authentication
Please note that the tool hides the password in the input field but unless you use CRAM-MD5,
anyone who can take a picture of your screen can
easily decode the entered password.
When authenticating to an SMTP server, the server responds with either
235 2.7.0 Authentication
successful or 535 5.7.8 Error: authentication failed.
If you want to submit an email to Gmail with the instructions generated by the above tool,
you have to allow access from less secure apps
in
your account settings.
If the authentication still fails,
you might have to complete this page
according to these instructions.
Please note that
Google disables access from less secure apps automatically if it’s not being used for some time.
Newline characters
In teleprinters
(printers that operated like typewriters),
moving the carriage, which outputs the characters onto paper, back to the start of
the same line
and moving the page to the next line were two separate instructions.
The former is known as carriage return (CR),
the latter as
line feed (LF).
Both CR and LF were included as control characters
in the American Standard Code for Information Interchange (ASCII).
While some operating systems, such as Windows,
opted to encode a newline as a sequence of both CR and LF,
other operating systems, such
as Linux
and macOS, use only LF to encode a newline.
As you can imagine, this causes a lot of
interoperability issues.
Both SMTP
and the
message format require that lines end with both CR and LF.
By using the -crlf option,
openssl makes sure that this is the case.
Message termination
Origination date
Besides EHLO,
MAIL,
RCPT,
DATA,
and QUIT,
there are some other SMTP commands, which are rarely used in practice:
VRFY Mailbox Verify whether the given mailbox exists on the server.
EXPN Mailing list Expand the given mailing list (i.e. return the members).
HELP [Command] Ask for helpful information about the optional command.
Automatic responses
In certain configurations, mail servers send a message in response to an incoming message,
which leads to the following problems.
Mail loops
Bounce messages
3. Retrieve 1. Submit
Mail client
Historically, bounce messages were in a format that could be interpreted only by a human sender.
However, many messages are sent by
automated systems,
which should also be able to detect when a message couldn’t be delivered.
For example, mailing list software
should be
able to remove no longer valid addresses from the list automatically.
Two techniques address this problem:
Machine-processable non-delivery reports (NDR):
RFC 3464 specifies how multipart messages
can be used to send so-called delivery
status notifications (DSN) to the sender in a standardized way.
In short, the bounce message is marked with
Content-Type:
multipart/report; report-type=delivery-status; boundary="…"
and the machine-processable part is labeled with Content-Type:
message/delivery-status.
The report contains message-specific
and recipient-specific fields,
which are separated by a blank line.
The
RFC includes some examples.
The advantage of this approach is that even mail clients can make use of the report.
Its disadvantage is that
not everyone supports this format and even if everyone did,
the sender doesn’t learn for which recipient the message couldn’t be
delivered
if it was forwarded by an alias address.
Since non-delivery reports include the header fields of the original message,
this could be
recovered from the trace information.
Backscatter
While the communication between the client and the proxy is protected (indicated by the blue lines),
the communication between the proxy and the server is exposed in the company’s private network.
The malicious server (in red) has the same name as the legitimate server (in green).
An
attacker can impersonate the server by getting a certificate issued for their public key
or
by gaining access to the private key of the server and using the original server certificate.
Efficient
Input of Output of
any size fixed size
Infeasible
In these graphics, the given values are displayed in blue and the values to find in green.
Second-preimage resistance:
It’s infeasible to find a different input which hashes to the same output as a given input.
Given input 1
≠ Same output
Find input 2
Knowing one input may not be useful to find another input which hashes to the same output.
Collision resistance:
It’s infeasible to find two different inputs which hash to the same output,
resulting in a collision.
Find input 1
≠ Same output
Find input 2
Since hash functions map an infinite number of inputs to a finite number of outputs,
they have to produce an infinite number of collisions.
The
point of cryptographic hash functions is not that they don’t have any collisions,
it just has to be infeasible to find them.
In practice,
cryptographic hash functions should also satisfy the
avalanche criterion:
A small change in the input changes the output completely and
unpredictably.
A cryptographic hash function is said to be broken
if a preimage or a collision can be found more efficiently
than with a brute-
force search
or if the size of the output is so small that
a brute-force search becomes feasible with modern computers.
Regarding notation,
I will use : to assign
a value on the right to a variable on the left in the following boxes.
For example, a hash function is
then written as Output: hash(Input).
Sometimes, several values need to be combined into one before hashing.
I’ll use + to concatenate
several values in a secure way: Output: hash(Input1 + Input2).
When implementing this, you can use a special character
which may not
occur in any of the values as a delimiter
so that hash("a" + "bc") ≠ hash("ab" + "c").
The null character
is used for this purpose in the
case of PLAIN authentication.
The three digits at the end of some algorithm names indicate the size of the output in bits.
SHA-256, for example, hashes inputs of arbitrary
size to 256 bits.
SHA-224, SHA-256, SHA-384, and SHA-512 belong to the SHA-2 family of hash functions.
Collisions have been found for MD5
and SHA-1,
which hash to 128 and 160 bits respectively.
These algorithms should no longer be used.
File File
Content Consumer
producer Hash hash(File) = Hash?
Password protection:
In order to verify whether a user provided the correct password,
a server doesn’t have to store the password of the
user.
The server can simply store a salted hash of the password
and then check whether the user provided the same password as before
by computing and comparing its hash.
The advantage of this approach is that an attacker who compromised the database
cannot log in as
the user as they don’t know the preimage of the salted hash.
A server can reduce the damage of a leaked database by storing individually salted hashes instead of passwords.
Key derivation:
Cryptographic hash functions are designed to run as fast as possible.
While good performance is desirable for many
applications,
it’s not desirable when hashing passwords.
Even if the hash of a password is salted,
an attacker can still perform a brute-force
attack
to find an input which hashes to the given output with the given salt.
In order to make such attacks costlier,
passwords are often
hashed thousands of times instead of just once.
Repeated hashing means that you take the output of one round as the input to the next
round.
This also makes the computation costlier for the legitimate parties
but unlike an attacker, they have to compute the derivation only
once per session.
Making a weak key
more secure against brute-force attacks by increasing the cost
is called key stretching.
One
algorithm for doing so is the
Password-Based Key Derivation Function 2 (PBKDF2),
which is specified in RFC 8018.
Additionally,
cryptographic keys typically have a desired length,
which is another reason for using a key derivation function (KDF).
PBKDF
Password Cryptographic key
Repeated
hashing
By hashing an input repeatedly, you can turn an efficient hash function into an inefficient one.
Independent values:
Another use case of cryptographic hash functions is
to generate a sequence of unrelatable values from a single
source value.
Such a source value is called a seed
because a tree of values can grow from it.
The seed is then hashed with a counter or a
timestamp.
As long as the seed remains secret,
others cannot compute the next value from the previous one and vice versa.
Hash
functions are used for this purpose in
contact tracing apps,
cryptocurrency wallets,
and one-time passwords (OTP).
If a hash function
fulfills the strict avalanche criterion,
it can even be used as a pseudo-random number generator (PRNG)
or as a block cipher for
encryption.
For all these use cases, the seed has to be chosen randomly,
which means it has to have enough entropy.
If you don’t like
password managers,
you can use hash functions to generate site-specific passwords as SitePassword: hash(LoginDomain +
MasterPassword).
Unless you know exactly what you’re doing,
I advise you not to use this technique as there are many pitfalls,
such as
leaving your password in the command history
or accidentally including newline characters, but it’s certainly a neat idea.
(The order of
LoginDomain and MasterPassword is important as you might be vulnerable to a
length-extension attack otherwise, see below.)
Commitment schemes:
A commitment scheme allows you to commit yourself to a value
while keeping the value secret until you reveal it
later.
You can think of it as giving a locked box to a recipient
while providing the key to open the box only later.
A commitment scheme has
to be both
binding and hiding:
The committer may not be able to change the committed value
and the recipient may not be able to figure
out the committed value.
In order to understand why this is useful,
let’s look at an example from Wikipedia.
Suppose Alice and Bob need
Alice Bob
hash(CoinFlipAlice + Nonce)
CoinFlipBob
?
CoinFlipAlice, Nonce
Message authentication:
How can two parties be sure that no one tampered with their communication?
They can achieve this by
extending each message with a value
which depends on the message and which only they can generate.
Such a value is called a
message
authentication code (MAC).
If an attacker modifies a message, the original MAC no longer matches the message
and the attacker cannot
fix this because they cannot generate a valid MAC.
Both parties compute the MAC for each message they receive
and reject all messages
for which the transmitted MAC is different from the computed MAC.
One way of implementing message authentication codes is to hash
the message together with a value
which is known only to the legitimate parties.
This value is a shared secret
and it is used as a
cryptographic key.
For example, the MAC could be computed as hash(Key + Message).
Unfortunately, this isn’t secure when used with
any of the
hash functions listed above as they are all vulnerable
to length-extension attacks.
The problem is that these algorithms leak
their internal state as the result,
so an attacker can simply continue where the legitimate party left off
without having to know the shared
key.
This means that, given a Message and the corresponding MAC,
an attacker can generate a valid MAC for the message Message +
MaliciousAddition.
While swapping the key and the message solves this problem,
hash(Message + Key) makes the MAC immediately
vulnerable
as soon as the hash function becomes vulnerable to
collision attacks.
In order to avoid such issues, cryptographers came up
with the
Hash-based Message Authentication Code (HMAC) in 1996,
which is defined as follows:
hmac(Key, Message) = hash([Key' ⊕
OuterPadding] + hash([Key' ⊕ InnerPadding] + Message)),
where the ⊕ denotes the bitwise exclusive-or operation
and the square
brackets are used only to make the parenthesis matching easier.
The paddings are the same for everyone and their purpose is to make the
key in the inner hash
different from the key in the outer hash.
If the key is longer than the block size
of the used hash function, it needs to
be hashed: Key' = hash(Key).
Not always hashing the key leads to trivial collisions,
which should have been avoided when specifying the
algorithm.
As long as you understand what HMAC is good for, the details don’t matter here.
The SHA-3 algorithms aren’t susceptible to
length-extension attacks
and my understanding is that the much simpler hash(Key + Message) construction works as intended with
them.
What is important to note is that hash-based message authentication codes are symmetric:
Whoever can verify them can also
generate them.
Unlike digital signatures,
message authentication codes allow a party to repudiate
messages which it authenticated
because the other party could have generated the corresponding MAC as well.
Client Server
Proof of inclusion:
When collaborating online,
it’s sometimes useful to be able to prove to others that a
record
has been incorporated into
the current state of a system
without having to share or even disclose all the other records.
This can be accomplished by repeatedly
hashing two hashes into one
until you’re left with a single hash which captures the state of the whole system.
The resulting structure is
called a Merkle tree.
Records cannot be added to, removed from, or modified in this structure
without affecting the so-called root of the
tree.
If someone accepts that a specific root represents the state of the system,
you can prove to this person that a particular record is
included in this state
by revealing the hash of the branches with which this record has to be hashed in order to arrive at this root.
This
method is interesting for two reasons:
The proof grows logarithmically
with the number of records, which makes it scale very well, and the
other records have to be
neither revealed nor transmitted for the verification to succeed.
Such proofs of inclusion are used in Bitcoin
for
Simplified Payment Verification (SPV),
in decentralized timestamping
for document aggregation,
and in Certificate Transparency for
auditing.
Proof of work:
In publicly accessible systems,
you want to discourage participants from using a shared and limited resource beyond their
fair share.
One way of doing so is by imposing a cost on using the resource,
which deters anyone who doesn’t value the resource higher
than its associated cost.
The resource owner can either charge a fee for using the resource
or require its users to waste a limited resource
of their own.
While the former approach is less wasteful,
the latter approach doesn’t require a global infrastructure for
micropayments.
For example, your mailbox is a publicly accessible system and your time is a limited resource.
What if you could require every unknown
sender to waste one minute of computing power
before they can deliver an email to your inbox?
This would prevent spammers from
sending millions of emails a day
– or at least make this antisocial behavior much costlier.
It turns out that there’s a simple way to achieve
this:
You could require that the hash of incoming messages falls into a tiny range.
Since one cannot influence the output of a hash function,
senders have to keep appending different nonces to their message
until its hash finally falls into the desired range.
As long as the hash
function isn’t broken,
there’s no better way than to keep trying until you’re lucky.
It’s like trying to hit the bull’s eye on a target
when you
have zero control over the trajectory of your darts.
While finding an appropriate nonce requires many computations,
the recipient has to
hash the message just once
in oder to verify whether the required work has been done.
The average difficulty of the problem can be
adjusted by making the target range bigger or smaller.
This technique was invented in 1992
as a digital postage stamp
but saw widespread
usage only with the rise of
cryptocurrency mining.
hash(Content + 2)
hash(Content + 1)
Content
hash(Content + 3)
hash(Content + X)
Finding a nonce which makes the hash of the content fall into a certain range requires many attempts.
Exclusive or is a binary
truth function,
which means that it combines two inputs into a single output,
where all values are either true or false.
Exclusive or returns true if one of the inputs is true but not both.
Instead of true and false, we will use the symbols 1 and 0.
The operator is
often written as a plus in a circle
because it corresponds to binary addition
without the carry.
Functions which map a finite combination of
inputs to some output
can be specified simply by listing all possible mappings in a table:
A ⊕ B = C
0 0 0
0 1 1
1 0 1
1 1 0
Since the above table is exhaustive, you can convince yourself that the following properties hold
simply by studying all cases:
Key Key
Alice Bob
Eve has access to the ciphertext and knows the algorithms in blue,
while the information in green is known only to Alice and Bob.
Now that we’ve covered the cryptographic concepts that we will need
(namely hash functions,
salts,
nonces,
key derivation functions,
message authentication codes,
and exclusive or),
we can turn our attention to password-based authentication mechanisms.
What makes
them interesting is that we want to arrive at strong security from relatively weak passwords.
Unfortunately, I couldn’t find any good literature on desirable properties of password-based authentication mechanisms,
which is why I made
up the following criteria myself.
Since this isn’t my area of expertise,
let me know if I missed an important aspect.
(Section 5 of RFC 7616
is
the best source that I could find, covering security considerations of
Digest Access Authentication.)
7. Wrong server: The client detects when it’s connected to the wrong server
(see the box on our dangerous reliance on TLS).
8. Compromised certification authority:
The client detects when the used certificate doesn’t belong to the actual server.
9. Compromised server key: The client detects when the server is impersonated
even if the same certificate is being used.
10. Comparison attacks: A compromised server cannot learn
whether two different accounts are protected with the same password,
neither
when creating an account nor during ordinary authentication.
Such knowledge can be used to infer that the accounts belong to the same
person
– or to contact and bribe one user to compromise the password of the other user.
The goal of defense in depth is to limit the potential harm as much as possible.
Given that you can reset the password of many of your online
accounts through your email account,
you don’t want to send one of your most valuable passwords
directly to a potential attacker when
checking your inbox.
Let’s look on how the three authentication mechanisms perform in this regard:
Database compromise
Replay attacks
Denial-of-service attacks
Server impersonation
Wrong server
Comparison attacks
1. Bugs in implementation:
Even if an authentication mechanism is resistant to an attack in theory,
it can be vulnerable to it in practice
because of software bugs.
All you can do is to actively look for them,
encourage their disclosure, and fix them soon.
2. Account theft:
Authentication mechanisms usually don’t specify how users set and change their password.
An attacker who intercepts
the daily communication between a client and a server shouldn’t be
able to change the user’s password,
thereby stealing their account.
5. Server compromise:
As long as the server is compromised, there’s nothing left to protect by an authentication mechanism.
6. Client compromise:
An authentication mechanism cannot prevent users from entering their password into a compromised client.
The
harm can be limited only by using app-specific passwords or OAuth
(see the second point about account theft).
Client Server
(Connect)
Challenge
Response
2. Replay attacks: As long as the server never issues the same challenge twice,
the response from the client is valid only in the current
session.
If an attacker replays an old response to a new challenge,
the server rejects the received value as invalid.
9. Compromised server key: The client cannot detect when the server is impersonated
if the same certificate is being used.
10. Comparison attacks: Since the server stores the passwords,
it can easily determine if two accounts use the same password.
(Connect)
(Connect)
Challenge
Challenge
Response
Response
A man-in-the-middle attack
on a challenge-response authentication mechanism.
Key derivation:
Instead of using the password directly, SCRAM uses
PBKDF2 to derive a
cryptographic key.
By salting the password and
hashing it thousands of times,
a brute-force search
for the password given the key becomes very costly.
Message authentication:
The derived key is used to authenticate a message from the client to the server
and a message from the server
to the client with an HMAC.
The server can authenticate its message only if it knows the derived key.
We thus have mutual authentication:
The server is certain that the user is who they claim to be
and the client is certain that the message came from the right server.
Exclusive-or encryption:
The problem is that the server shouldn’t store this key.
Otherwise, anyone who compromised its database can
impersonate the user
by authenticating the appropriate message with the stolen key.
This can be solved by storing a hash of the derived
key instead.
Note that the derived key doesn’t need to be salted and stretched here
because the best way to find the preimage of the
hashed key is to guess the low-entropy password,
which is itself already salted and stretched to arrive at the derived key.
The client then
uses the hashed key to authenticate its message.
So far we have only moved the problem, though,
because the hashed key now has the
same role as the derived key before.
The trick is that the client proves to the server
that it knows the preimage of the stored key
by
encrypting the preimage with the HMAC.
Since only the legitimate parties can compute the HMAC,
the server can decrypt the preimage
but the attacker cannot.
If this is confusing, then re-read this paragraph after you’ve seen the protocol flow below.
Optional channel binding:
The authenticated message includes everything
which the client and the server have to agree on.
One useful
thing to agree on is that they are connected to the same secure channel.
Binding the channel on the application layer
to the channel on the
security layer
prevents man-in-the-middle attacks.
Channel binding is optional in SCRAM.
There are different ways to bind the inner
channel to the outer channel with different tradeoffs.
We’ll cover them in the next box.
Client Server
Mutual authentication guarantees only that the inner channel (in green) reaches the counterparty.
Channel binding can be used to ensure that the outer channel (in blue) isn’t interrupted by an attacker.
The client and the server compute the following values based on the above values:
Key: pbkdf2(Password, Salt, IterationCount)
HashedKey: hash(Key)
Message: Username + ClientNonce + ServerNonce + ChannelBinding
HashedKeyMac: hmac(HashedKey, Message)
KeyXorHashedKeyMac: Key ⊕ HashedKeyMac
KeyMac: hmac(Key, Message)
For each user, the server stores Username, Salt, IterationCount, and HashedKey.
The following messages are exchanged:
Client Server
Username, ClientNonce
KeyXorHashedKeyMac
KeyMac
Since a user has to be able to authenticate themself on a new client with just their Username and Password,
the server has to store the Salt
and the IterationCount and provide it to the client on request.
Since the user is not yet authenticated at this stage,
anyone can request the
Salt and the IterationCount of any user.
(The IterationCount determines how many times the salted password is hashed.)
After the first
two messages, both the client and the server can compose the Message
and compute the HashedKeyMac as the HMAC with the HashedKey.
The client then sends the Key encrypted with the HashedKeyMac to the server,
which decrypts the Key as KeyXorHashedKeyMac ⊕
HashedKeyMac.
In the next step, the server verifies whether hash(Key) = HashedKey.
If this is the case, it has successfully authenticated the
client.
If not, the server aborts the session.
At last, the server uses the Key to authenticate the same Message to the client.
By also computing
KeyMac, the client can verify that the last message was indeed sent by the server.
Since both parties can compose the Message,
the message
authentication codes (MAC) can be sent without the Message.
The Username is included in the Message because it wasn’t authenticated in
the first message.
Without this, a man-in-the-middle could replace it to authenticate the user for another account
where they use the same
password.
Let’s analyze how Simplified-SCRAM is or can be made resistant
to all but one of the above properties:
tls-server-end-point
uses the hash of the server’s certificate: hash(ServerCertificate).
The advantage of this binding is that
it can
easily be used with a reverse proxy.
Its disadvantage is that it doesn’t protect against compromised server keys.
tls-unique
uses the first TLS Finished message of the latest
TLS handshake.
Since the Finished message contains a hash over all
previous handshake messages,
it uniquely identifies a particular TLS connection.
For full TLS handshakes, the first Finished message is
sent by the client.
For abbreviated TLS handshakes, the first Finished message is sent by the server.
Depending on which type of
handshake has been performed and which of the two endpoints you implement,
you have to call either getFinished()
or
getPeerFinished()
to access the right message for channel binding.
In theory, tls-unique is the preferred option for channel binding
because it also prevents attacks with compromised server keys.
In practice, however, tls-unique requires
proxy servers to forward the
first Finished message to the application server
so that it can compose the SCRAM Message correctly,
which makes this option more
difficult to deploy.
Access protocols
Besides proprietary protocols,
most incoming mail servers allow mail clients to access the user’s mailbox
with POP3 or IMAP.
If your mail client
and your mail server support both protocols,
you should choose the latter as it’s much more powerful.
The main reason for including POP3 in this
article
is that it’s much easier to use from the command-line interface.
Apple Mail allows you to inspect its communication with your mail servers
by clicking on “Connection Doctor” in the “Window” menu and
then on “Show Detail”.
You can also enable “Log Connection Activity” there to persist the log of its communication
in the folder
~/Library/Containers/com.apple.mail/Data/Library/Logs/Mail/.
Since the log files include the content of all your messages,
including deleted ones and those of removed accounts,
you should enable this option only if you really need it.
You can inspect how Thunderbird interacts with your mail servers
by logging its communication with the following commands:
$ export MOZ_LOG=timestamp,append,SMTP:3,POP3:3,IMAP:3
$ export MOZ_LOG_FILE=~/Downloads/thunderbird.moz_log
$ /Applications/Thunderbird.app/Contents/MacOS/thunderbird-bin
The following POP3 tool works in the same way as the ESMTP tool above.
Most of the remarks I made earlier therefore still apply.
In particular, I
advise you to use it only with accounts created for this purpose.
The tool uses Thunderbird’s configuration database
and Google’s DNS API
to
resolve the server you want to connect to.
Copy the commands in bold to your command-line interface by clicking on them.
The text in gray
mimics what the responses from the server look like.
The actual responses will be different.
Each response starts with either +OK or -ERR.
The
former indicates that your command was successful,
the latter indicates that an error occurred.
If necessary, you can always kill the current
process
and thereby the connection by pressing ^C (control + c).
If you use Gmail, you have to enable POP3 access
in your account settings
and
allow access from insecure apps.
Port: 995
POP3 commands
STAT – Count Size Return the count and size of all messages.
LIST [Number] Number Size List the size of all messages [or of the specified one].
RETR Number Message Retrieve the message with the given number.
DELE Number – Mark the message with the given number as deleted.
POP3 extensions
STLS – – Upgrade the connection from TCP to TLS just like STARTTLS.
TOP Number X Message Return the header and the top X body lines of the specified message.
UIDL [Number] Number ID List the permanent ID of all messages [or just the specified one].
PIPELINING – Indicates that the server can handle multiple commands at a time.
RESP-CODES – Indicates that the server supports extended response codes in square brackets.
AUTH-RESP-CODE – Indicates that the server tells the client why an authentication attempt failed.
IMPLEMENTATION Name Indicates the name of the server’s POP3 implementation for troubleshooting.
SASL Mechanisms Indicates the SASL mechanisms which can be used with the AUTH command.
LOGIN-DELAY Seconds Indicates how many seconds the client has to wait before connecting again.
EXPIRE Days Indicates after how many days the server deletes (retrieved) messages.
The server can indicate additional behavior in its response to the CAPA command.
LOGIN-DELAY and EXPIRE allow the server to conserve its resources.
APOP authentication
List: Append:
Write: Idle:
Fetch: Date
* 2 RECENT
* OK [UNSEEN 3]
* OK [UIDNEXT 5]
* OK [UIDVALIDITY 1]
* OK [PERMANENTFLAGS ()]
{Data}
F OK FETCH completed
O LOGOUT
* BYE Logging out
O OK LOGOUT completed
Protocol states
Not authenticated
LOGIN
UNAUTHENTICATE
AUTHENTICATE
Authenticated
SELECT CLOSE
EXAMINE UNSELECT
Selected
LOGOUT
Logout
A word on terminology:
The standard and some mail clients such as Apple Mail speak of mailboxes rather than folders.
When I speak of
mailboxes, I usually refer to the mail account as a whole.
Thunderbird, on the other hand, avoids the term completely.
I mostly ignore IMAP
folders and how to
CREATE,
DELETE, and
RENAME them.
The only important aspect for us is that INBOX is a
special name
and always refers to
the primary folder of the user.
Data formats
E EXAMINE INBOX
Unquoted string: Since INBOX contains no spaces, it doesn’t have to be quoted (but it can be).
Subject: Example
F OK FETCH completed
Lists:
Lists
are used when a variable number of items are to be transmitted.
A single space is used to separate adjacent items
and the list is
enclosed by parentheses.
Lists can be nested in other lists and lists can be empty.
Let’s look at two examples:
F FETCH 1 (FLAGS)
* 1 FETCH (FLAGS (\Seen))
F OK FETCH completed
F FETCH 1 (FLAGS)
* 1 FETCH (FLAGS ())
F OK FETCH completed
Empty list: When the message hasn’t been seen yet, the nested list is empty.
Nil:
NIL indicates that an item doesn’t exist.
You have to consult the formal syntax to see where NIL is allowed.
Message numbers
Similar to POP3,
messages in IMAP can be referenced either by their position in a folder or by their unique identifier (UID):
Position:
If the response to the SELECT or EXAMINE command says with 8 EXISTS that 8 messages exist,
then 1 refers to the oldest
message and 8 to the newest message.
All numbers in between are guaranteed to refer to messages as well.
When a message is removed
from the folder,
the position of all subsequent messages is decremented by one.
Messages are always added at the end of the list:
When a
new message is added to the 8 existing ones,
it can be referenced by the number 9.
UID:
UIDs are numbers which are assigned in ascending order to messages.
Unlike the position of a message, which can change within and
across sessions,
its UID is meant to stay the same.
When a message is deleted, the UIDs of subsequent messages don’t change.
As a
consequence, UIDs are not necessarily contiguous.
Mail clients use UIDs to synchronize flags and deletions
of the messages they’ve
already retrieved from the server.
IMAP has a special UID command,
which allows the client to use SEARCH,
FETCH,
STORE, and
COPY with
UIDs instead of positions.
For example, clients issue the command TAG UID FETCH 1:{LastSeenUIDNEXT-1} FLAGS
to discover changes
to old messages according to the informational
RFC 4549.
In other words, clients find out which messages have been deleted while they
were offline
by fetching the flags for all locally stored messages from the server every time they reconnect.
All messages whose UID is no
longer in the response are then removed.
If the UIDNEXT value in the response to EXAMINE or SELECT
is bigger than the last time the client
connected,
the client knows that new messages arrived in the meantime.
If the UIDVALIDITY value in the same response is bigger than the
last time it connected,
the client has to invalidate its UIDs and rebuild its database.
Due to the overhead this causes, servers should avoid
invalidating UIDs.
However, since folders can be renamed and clients reference them by name,
the content of a folder can change
completely.
By using the current timestamp as the UIDVALIDITY value
whenever a folder is created or renamed,
servers can force clients
to refetch all messages in such a folder.
Message sets
FETCH,
STORE, and
COPY operate on a
set of messages.
You can specify a single number, such as 4,
a range of numbers, such as 6:8,
or a
combination thereof, such as 4,6:8.
When referencing messages by their position,
6:8 is guaranteed to select three messages as long as
there are at least eight messages in the folder.
When using the UID command,
6:8 selects between zero and three messages,
depending on
whether messages with UIDs in this range have been deleted.
* represents the largest number in use.
When referencing messages by their
position,
* corresponds to the number of messages in the folder.
If the folder is empty, you get an error when using *.
If you want to fetch the
flags of all messages,
you can use F UID FETCH 1:* (FLAGS).
If you want to fetch all new messages,
you can use F UID FETCH
{LastSeenUIDNEXT}:* (FLAGS BODY.PEEK[]).
{LastSeenUIDNEXT} needs to be replaced with an actual number, of course.
(You have to
replace the curly brackets with an actual value in all my examples
except when the curly brackets are used as the length prefix of a literal
string.)
Message flags
S OK STORE completed
How a custom flag can be created and set if the IMAP server supports it.
Internal date
Besides flags,
messages have other attributes as well.
One of them is the internal date,
which records when the message was received.
Mail
clients can display messages with this date
instead of the sender-chosen origination date.
Since Apple Mail also displays the received date
instead of the sent date
when fetching messages via POP3,
it seems to rely on the Received header field indeed.
Other attributes which can
be fetched
are the message size
and the body structure
of multipart messages.
F FETCH 1 (INTERNALDATE)
* 1 FETCH (INTERNALDATE "24-Nov-2020 15:43:32 +0000")
F OK FETCH completed
IMAP commands
Some of the commands used in the above tool benefit from additional information.
This is what you should know about them:
EXAMINE vs.
SELECT:
Both commands open a folder in order to search and fetch the messages in it.
The difference is that EXAMINE opens
the folder in read-only mode,
while SELECT also allows the client to change and delete messages.
This is made visible in the response line
which starts with the tag:
It contains either [READ-ONLY] or [READ-WRITE].
SEARCH:
Saving a search result for later operations requires the SEARCHRES extension.
If your IMAP server doesn’t support it,
you have to
search without the RETURN (SAVE) part: S SEARCH {Criterion}.
The server then returns the positions
of all the messages that match
the criterion: * SEARCH 2 5 8.
Search criteria can also be combined: S SEARCH {Criterion1} {Criterion2} {etc.}.
IMAP also
supports the logical operators NOT and OR
besides the implicit “and”: NOT {Criterion} and OR {Criterion1} {Criterion2}.
As you can
see, the query language of IMAP is quite powerful.
FETCH:
The first argument to the FETCH command is a set of messages.
If the server supports saving the search result with RETURN (SAVE),
you can alternatively reference the search result with the dollar sign.
The second argument is a list of the data attributes you want to
fetch.
The difference between BODY[{Section}] and BODY.PEEK[{Section}] is that
the former sets the \Seen flag while the latter does
not.
You can use either one to fetch the desired section
of the specified messages.
STORE:
The STORE command allows the client to alter the flags of a message.
Similar to FETCH, the first argument is either
a message set or
$ for a search result.
After that, you can replace the flags of the messages with FLAGS ({NewFlags}),
add additional flags to the existing
flags with +FLAGS ({FlagsToAdd}),
or remove some flags from the existing flags with -FLAGS ({FlagsToRemove}).
Messages are
deleted by flagging them as \Deleted and then using the
CLOSE or
EXPUNGE command.
The former also closes the folder and takes you
back to the authenticated state,
whereas the latter doesn’t do that.
APPEND:
Mail clients use this command to store sent messages in the user’s mailbox.
Since the target folder is specified in the first
argument,
this command can be used from the authenticated state.
Besides the flags you want to set on the appended message,
you can
also specify the internal date in an optional third argument.
The fourth argument is the message that you want to append,
which has to be
IMAP extensions
The most important extensions to IMAP are (ignoring the ones for
internationalization,
such as support for UTF-8):
ID (RFC 2971):
For improving bug reports and assembling usage statistics,
it’s useful to know which implementation of the protocol the
other party uses.
The ID command allows the client to send a list of key-value pairs to the server
and receive a list of key-value pairs in
return.
Some keys are specified in the RFC
but any string of at most 30 bytes can be used as a key.
For example, a client can send TAG ID
("name" "ef1p") to the server
and receive * ID ("name" "Dovecot") in return.
Permanent identifiers:
JMAP servers assign permanent identifiers to all objects.
In the case of messages, these identifiers can no longer be
invalidated
and they no longer change when a message is moved from one folder to another.
In the case of folders, JMAP clients can detect
when a folder has been renamed
and no longer need to fetch all the messages in it again.
Efficient synchronization:
JMAP provides a simple method
for getting the identifiers of created, updated, and destroyed messages and folders.
As we have seen above, synchronizing a mailbox with IMAP is easy only
if you stay connected to the server, which isn’t an option for mobile
clients.
Push mechanism:
In order to be informed immediately about changes to a folder, such as newly arrived messages,
IMAP clients use the IDLE
command.
If they want to be informed about changes to several folders,
they have to open a separate connection for each folder.
JMAP, on the
other hand, allows clients
to subscribe to all changes on the server at once.
Clients which can keep a connection to the server open can
subscribe via the
EventSource interface.
Other clients, such as those on mobile phones, can register a
callback URL,
which allows them to use
their platform-specific
push technology.
Batching of chained commands:
When the IMAP server doesn’t support certain extensions such as SEARCHRES,
IMAP clients often need to
wait for the response to one command
before they can construct the followup command.
JMAP allows clients to batch several commands
and
to reference the results
from earlier commands in the same request.
Doing so avoids round trips
and makes updates more atomic
(i.e. it
becomes less likely that only some of the issued commands are being executed).
Widespread data format:
JMAP data doesn’t have to be encoded as JSON
and future standards can specify other data formats.
The same is
true for the transport protocol:
While JMAP currently uses HTTPS as its transport protocol,
other protocols can be added in the future.
The
choice of JSON and HTTPS is mostly due to their widespread adoption:
There are suitable libraries for all relevant programming languages
and
software engineers know how to use those.
It’s worth mentioning that JMAP doesn’t wrap binary data in JSON.
Binary data is exchanged in
separate connections.
Complexity on server:
JMAP moves the complexity of handling email’s message format from the client to the server.
While clients can still
fetch the raw message if needed,
for example when implementing end-to-end security,
the server has to deal with multipart messages,
content
encodings, line-length limits, etc.
Clients can download and upload messages as a
simple JSON object.
Please note that this affects neither how
Email filtering
It can be useful to filter incoming messages
according to custom rules.
For example, you may want to move certain messages to a certain folder,
mark certain messages as read, or delete certain messages automatically.
Most mail clients allow their users to configure such rules,
which are
executed when the mail client receives a new message.
There are several advantages of filtering incoming mail on the server rather than on the
client, though:
Synchronization:
If the filtering rules are stored on the incoming mail server,
they can be inspected and edited through any of the user’s mail
clients.
Otherwise, users have to remember on which client they’ve created the rule that they want to modify now.
No race conditions:
If the filtering rules are stored on a mail client,
then the rules are not applied when this mail client is offline.
In this
situation, other mail clients see unfiltered messages.
If these mail clients apply rules of their own,
you might run into race conditions,
where the
order in which clients see incoming messages determines the outcome of the filtering.
Offline
mail client
of recipient
Outgoing Incoming
Mail client
mail server mail server
of sender
of sender of recipient
Online
mail client
of recipient
How a message is delivered from the mail client of the sender to the mail clients of the recipient.
Messages can be filtered by the incoming mail server (in green) or by an online mail client (in blue).
You can generate simple filtering rules with the following tool.
Make sure that the Argument makes sense for the chosen Action.
Move requires the
name of a folder, Forward an email address,
Flag the name of a flag, and Reply the text of the reply.
Value: Test
require "imap4flags";
if header :contains "Subject" "Test" {
addflag "\\Seen";
Out-of-office replies
Prior to JMAP,
where servers can support
the configuration of vacation responses,
Sieve and ManageSieve
with the vacation extension
were
the only standardized way to configure such responses.
According to RFC 3834,
the same response should be sent to the same sender only
once within a period of several days
even when the sender sends additional messages.
if allof (currentdate :value "ge" "date" "2021-10-22", currentdate :value "le" "date" "2021-10-29") {
vacation "Hi, I had to take a couple of days off to read ef1p.com/email. I will be back soon.";
A simple Sieve script for an automatic vacation response, which I’ve adapted
from Gandi.
The following tool shows you how to use the ManageSieve commands
from your command-line interface.
Unlike the previous tools,
you have to
configure the address and the port number of the server manually
as this information is not included in Thunderbird’s configuration files.
The
standard describes how to locate the ManageSieve server
with SRV records
and the autoconfiguration tool above does query the _sieve._tcp
subdomain.
However, since virtually no one configures such SRV records (at least not for the ManageSieve protocol),
I didn’t bother to implement
this discovery mechanism here.
ManageSieve servers listen on port 4190 by default.
The Thunderbird plugin, which I mentioned earlier,
simply
probes this port
on the IMAP server
in order to configure itself.
Username: [email protected]
discard;
List:
"SASL" "PLAIN"
"VERSION" "1.0"
OK "STARTTLS completed."
AUTHENTICATE "PLAIN" "AGFsaWNlQGV4YW1wbGUub3JnAA=="
OK "AUTHENTICATE completed."
CHECKSCRIPT {62+}
require "body";
discard;
OK "CHECKSCRIPT completed."
PUTSCRIPT "MyScript" {62+}
require "body";
discard;
OK "PUTSCRIPT completed."
SETACTIVE "MyScript"
OK "SETACTIVE completed."
LOGOUT
OK "LOGOUT completed."
IMAP:
LibreSSL first issues . CAPABILITY to check whether the server supports STARTTLS.
ManageSieve servers ignore this as an invalid
command.
LibreSSL then tries to initiate TLS anyway and sends . STARTTLS.
Since the ManageSieve protocol doesn’t use tags,
this line
fails to achieve what we want.
POP3:
Using -starttls pop3 doesn’t work because POP3 clients use
STLS instead of STARTTLS to upgrade the connection.
SMTP:
Using -starttls smtp could work but for some reason it also doesn’t work.
LibreSSL first sends the EHLO command,
which is
ignored by ManageSieve servers as an invalid command.
Continuing anyway, LibreSSL sends STARTTLS to the server and doesn’t check the
response,
which is exactly what we were looking for.
Unfortunately, this still fails.
If you know why, please let me know.
$ brew --version
By default, OpenSSL is installed in the following location without replacing the preinstalled LibreSSL:
$ /usr/local/opt/openssl/bin/openssl version
Format
The format of an email message is specified in RFC 5322.
The goal of this chapter is to make you comfortable reading raw messages.
Gmail:
Open a message, click on ⋮ in the upper right corner, then on “Show original”.
Yahoo:
Open a message, click on ⋯ in the bottom middle, then on “View raw message”.
Outlook:
Web: Click on ⋯ in the upper right corner, then on “View” and “View message source”.
Desktop: Double-click a message, click on the “File” menu and then select “Properties”.
Thunderbird:
Raw message: Select a message, click on the “More” button and then “View Source” (or use U). ⌘
All header fields: Click on the “View” menu, then on “Headers” and “All” (or on “Normal” to go back).
Apple Mail:
Raw message: Click on the “View” menu, then on “Message” and “Raw Source” (or use the shortcut ⌥⌘U).
All header fields:
Click on the “View” menu, then on “Message” and “All Headers” (or use the shortcut ⇧⌘
H).
Change preferences:
In the “Viewing” tab of the preferences, you can configure which header fields are displayed.
File format
Since messages, including attachments, are just text,
they can be stored as simple text files.
A common filename extension
for emails is .eml.
Such
files can be viewed with any text editor.
Desktop clients usually have an option to save a message as a file,
and among Web clients, at least Gmail
allows you to download a message in the “⋮” menu,
which is located in the upper right corner.
Storage format
For their own purposes, mail clients can store messages in whatever format they want.
The two formats which are used by several mail
clients and servers to store messages
are Mbox and Maildir.
By default, Thunderbird uses the former but it can also be configured to use the
latter.
The Mbox format is specified in RFC 4155.
All messages are appended in their raw format to a single file.
Mbox is a text-based format,
which means that a given string, namely From …, is used to delimit the messages
and that occurrences of this string in messages have to be
escaped.
Storing all the messages in a single file is not ideal
as it might easily get corrupted if it’s not properly locked
while reading from and
writing to it.
Additionally, this format is inappropriate for backup systems
that copy the complete file and not just the differences
when the
content of a file has changed.
Thunderbird stores the messages at
~/Library/Thunderbird/Profiles/{RandomString}.default/ImapMail/{MailServer} on macOS.
If you use another operating system,
you find the storage location on
this page.
This directory contains two files for each of your mailbox folders.
For example, you should have a
large INBOX file and a much smaller INBOX.msf file,
which is used to index the messages in the former file.
(MSF stands for mail summary file.)
You can use the tail command
to display the specified number of lines of the last message that you’ve received:
tail -n 100 INBOX.
Unless
you want to transfer all your messages to a new computer,
you shouldn’t move or modify such files as this likely causes problems for your
mail client.
Similar to Maildir,
Apple Mail stores each message in a separate file at ~/Library/Mail/.
The used format is proprietary and there’s no
official documentation about it
but it’s fairly easy to reverse engineer.
After a folder with the version number of the format, V7 in my case,
Apple Mail generates a folder for each of the added email accounts
with a Universally Unique Identifier (UUID) as its name.
Inside these
accounts folders, Apple Mail generates a folder ending with .mbox
for each of the IMAP folders,
such as INBOX.mbox, Sent Messages.mbox,
and so on.
These mailbox folders contain another folder with a UUID,
which finally contains the Data folder with the actual messages in
further folders.
Put together, the folder nesting is as follows: ~/Library/Mail/V7/{UUID}/INBOX.mbox/{UUID}/Data.
Apple Mail enumerates the messages with a single counter across all your accounts.
It uses the filename extensions
.emlx for messages
without attachments and .partial.emlx for messages with attachments.
In these emlx files, Apple Mail prepends the length of the message
in bytes to the raw message
and appends a property list with additional information.
It’s a text-based format that you can open with any text
editor.
The messages are stored in a Messages folder inside the Data folder with their number used as their name.
For example, you might
Line-length limit
According to RFC 5322,
each line of a message may consist of at most 1’000 ASCII characters,
including CR + LF.
Implementations are free to
accept longer lines,
but since some implementations cannot handle longer lines,
you shouldn’t send them.
The RFC even recommends limiting
lines at 80 characters
to accommodate clients that truncate longer lines in violation of the standard.
In order to leave the line wrapping to the mail
client of the recipient,
the mail client of the sender has to encode the body
if the body contains lines which are too long.
If a header field is too
long,
it must be broken into several lines with
folding whitespace:
{CR}{LF} followed by at least one space or tab.
If a line in the header section of
a message starts with whitespace,
its content belongs to the header field on the previous line.
The procedure of breaking lines as done by the
sender is called folding,
the procedure of joining lines as done by the recipient is called unfolding.
When unfolding, runs of whitespace characters
are replaced with a single
space character.
Message identification
There are three header fields
to identify the current message and the previous messages
in the same thread:
Message-ID: <[email protected]>
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
An example of what the three message identification header fields look like.
The References field contains the message ID of the In-Reply-To field.
When using the RFC format, the algorithm A, which is used to determine the remaining bits, is one of the following:
1: The remaining bits consist of the current timestamp
and the MAC address of the device which generated the UUID.
2: A variant of algorithm 1 used in the
Distributed Computing Environment (DCE)
by the Open Software Foundation (OSF).
3: The MD5 hash of a namespace identifier
and a name within that namespace.
A and F overwrite six bits of the MD5 hash.
4: The remaining 122 bits are chosen randomly.
The message IDs in the example above were chosen with this algorithm.
5: Algorithm 5 is the same as algorithm 3 but it uses SHA-1
instead of MD5 as the cryptographic hash function.
Trace information
According to RFC 5321,
whenever a mail server receives a message,
it must add a Received header field at the beginning of the message
without
changing or deleting already existing Received header fields.
Received header fields have the following format:
with {Protocol}
id {SessionId}
for {AddressOfRecipient};
Return-Path: <[email protected]>
Since the Bcc recipients are usually removed from the message
even for the Bcc recipients themselves,
mail clients don’t know whether a
message has been forwarded
or whether the user was a hidden recipient
if the user’s address is not listed among the recipients.
By
inspecting the Received header fields,
mail clients could easily distinguish between the two scenarios in most cases:
If the message has been
forwarded,
there should be a Received header field with one of the recipient addresses in the for clause.
Recovering the address through
which a message has been forwarded to your mailbox
could be useful for filtering incoming messages into different folders automatically.
And as we discussed earlier,
mail clients shouldn’t offer a reply-to-all option for messages
where the user was a Bcc recipient
as this would
leak what the sender tried to hide by using the Bcc field.
LMTP is specified in RFC 2033 and may be used only in a local network.
LMTP uses LHLO instead of EHLO to greet the server.
I’ve included
LMTP only because you might encounter it as LMTP[S][A]
in the with clause of a Received header field.
LMTP also pops up in other places,
for example in the code
to which I linked earlier.
Content encoding
RFC 5322 specifies a format for text messages,
whose lines may consist of at most 1’000 ASCII characters.
Whenever the content of a message
doesn’t fulfill this requirement,
it must be encoded according to the
Multipurpose Internet Mail Extensions (MIME)
as specified in RFC 2045.
When mail clients encode messages according to MIME,
they indicate this with the following header field:
MIME-Version: 1.0
The header field used to indicate that a message is formatted using MIME.
In theory, the version number allows the Internet community to make changes to the standard.
In practice, however, the standard didn’t specify
how mail clients are supposed to handle messages with an unknown MIME version.
As a consequence, you cannot change the version number
without breaking email communications,
which makes this header field completely useless.
The version 1.0 survived the last 30 years and will
likely survive the next 30 years.
MIME also introduced additional message header fields,
which we’ll cover in this and the following subsections.
Base64:
Binary data and non-Western-European languages are best encoded with Base64.
While hexadecimal digits encode 4 bits each,
Base64 digits encode 6 bits each.
6 bits can represent 26 = 64 different values.
Base64 uses the characters A – Z, a – z, 0 – 9, +, and / to encode
these 64 values.
What makes the Base64 encoding special is that bytes and digits don’t align:
Three bytes are encoded with four Base64 digits.
If you shift the input by one or two bytes, the Base64 encoding looks completely different.
If the size of the input is not a multiple of three,
one
or two equality signs are appended to the output
in order to make the output a multiple of four.
This procedure is known as padding.
In order to
respect the line-length limit,
a line break is inserted after at most 76 Base64 characters.
Base64 encoding increases the size of the content by
33%
and the line breaks add another 2.6% on top of that.
You can encode and decode Base64 with the following tool:
The mail client of the sender informs the mail client of the recipient with the following header field
that the content is encoded:
Content-Transfer-Encoding: {Value}
This header field indicates with which method the content of the message has to be decoded.
The Value can be quoted-printable, base64, or 7bit if no content encoding has been used.
(If the 8BITMIME or BINARYMIME extensions are supported, the value can also be 8bit or binary.)
Strict:
Encoded: %C2%A1Buenos%20d%C3%ADas!
$ qprint -e
$ qprint -d
$ openssl base64 -e
$ openssl base64 -d
How to encode and decode Quoted-Printable, Base64, and Percent with Perl,
which is likely preinstalled on your computer.
You can use explainshell.com to
learn more about the used options.
The code uses the MIME::QuotedPrint,
MIME::Base64, and
URI::Escape modules.
Header encoding
RFC 2047 specifies how one can use non-ASCII characters
in certain header field values,
such as the subject and the display names.
Instead of
introducing new header fields to specify the encoding of existing header fields,
encodings in header fields indicate which character encoding
and
which content encoding has been used.
This results in the so-called Encoded-Word encoding.
Its format is as follows: =?{CharacterEncoding}?
{ContentEncoding}?{EncodedText}?=,
where CharacterEncoding is usually either ISO-8859-1 or UTF-8,
ContentEncoding is either Q for
Quoted-Printable or B for Base64,
and EncodedText is the field value encoded according to the previous parameters.
The Quoted-Printable
encoding is slightly modified when used to encode header field values:
Question marks, tabs, and underlines are escaped with their hexadecimal
representation
and spaces are encoded with underlines.
In order to adhere to the line-length limit,
whitespace between adjacent Encoded Words
is removed completely,
which allows the encoder to break long words with a newline
(and also to mix different character encodings).
The
following tool does all of that for you.
It uses Quoted-Printable or Base64 depending on which encoding is shorter,
and it supports only ISO-8859-
1 and UTF-8.
Punycode encoding
Punycode encodes non-ASCII symbols like ¡ and ≠ with letters, digits, and hyphens,
but it doesn’t escape the remaining printable ASCII
characters, such as !, =, and &.
Punycode would be more flexible if the initial state started with a code point of 0 instead of 128.
As we will
see soon, this doesn’t matter for internationalized domain names, though.
After a potentially large initial delta,
the subsequent deltas are small if all the characters come from the same language.
This is what makes
Punycode so efficient.
For example, Ελληνικά is encoded as twa0c6aifdar,
which consists of just four more characters.
Even more
astonishingly, the UTF-8 encoding of Ελληνικά takes 16 bytes,
whereas the UTF-8/ASCII encoding of twa0c6aifdar takes just 12 bytes.
Unicode normalization
Unicode normalization
distinguishes between encodings that are syntactically identical
and encodings that are semantically similar but not
identical.
The former is called canonical equivalence, the latter compatibility equivalence.
Additionally, some characters can be represented
by a single code point or by several code points.
The former is the composed representation, the latter the decomposed representation.
Based on these options, Unicode defines the following four
normalization forms (NF):
Composition Decomposition
Replacing characters by compatibility equivalence also replaces characters that are canonically equivalent.
There are no normalization forms
for the latter without the former.
The relationship between canonical equivalence and compatibility equivalence can thus be visualized as
follows:
Canonical Compatibility
equivalence equivalence
C a f é
Output: Copy the normalized string Copy the code points
43 61 66 E9
→ ₁→
Superscripts and subscripts:
¹ 1,
1
Number forms:
⅔ → 2⁄3,
Ⅳ → IV
Ligatures:
ff → ff,
fi → fi
(The ligature on the right-hand side of the second example is created by the font on this website.)
Digraphs:
ij → ij,
dž → dž
Letter-like symbols:
℃ → °C,
℅ → c/o,
™ → TM
Line-breaking behavior:
non-breaking space → space,
non-breaking hyphen → hyphen
(≠ hyphen-minus)
Substring:
If a string is normalized, then so are all its substrings.
Concatenation:
Even if two strings are normalized, their concatenation
might not be normalized.
Artistic use: Unicode can also be used to change the appearance of ASCII text.
For example, you can flip text upside down
or overuse
diacritics, which results in so-called
Zalgo text.
Localization: For some characters, the case mapping still depends on the language.
This is why JavaScript has a toLocaleLowerCase
and a
toLocaleUpperCase method.
For example, in the Turkish language,
'I'.toLocaleLowerCase('tr') === 'ı' and
'i'.toLocaleUpperCase('tr') === 'İ'.
Titlecase: Digraphs,
such as the dž used in Eastern European alphabets,
usually exist in lowercase, uppercase, and
titlecase.
For example,
džDŽ Dž
Unicode defines ,
, and .
The Dutch digraph ij, on the other hand,
is capitalized together, such as in IJsselmeer,
which is why only ij
IJ
and exist.
Since digraphs are usually written as two separate characters in practice,
titlecase algorithms which simply capitalize the first
letter get this wrong.
What makes internationalized domain names even more complicated is that there are two versions: IDNA2003 & IDNA2008.
(IDNA stands
for Internationalized Domain Names for Applications.)
IDNA2008 supersedes IDNA2003, which means that IDNA2003 should no longer be
used.
Since a lot of the confusion comes from the differences between them, we’ll look at both:
So how does IDNA2008 differ from IDNA2003? Let’s look at a few examples:
Symbols:
P≠NP.org was valid under IDNA2003
but is no longer valid under IDNA2008 since symbols are no longer allowed.
(Due to the
limitations of Punycode,
P=NP.org, on the other hand, was never valid.)
Disallowing symbols also prevents attackers from faking URL
separators
in domain names, which is a special variant of a homograph attack.
For example,
ef1p.com∕email.article.example,
which
uses a division slash in the domain label com∕email
under the top-level domain .example,
was also valid under IDNA2003 but is no longer
valid under IDNA2008.
Emojis:
Being a kind of symbol, emojis were allowed in IDNA2003 but are no longer allowed in IDNA2008.
Since IDNA2003 was limited
to Unicode version 3.2, only a tiny subset of emojis could be used,
namely those which were originally added as text characters
(mostly in
Unicode version 1.1 in 1993)
and given an emoji presentation in 2010.
The variation selector 16
was added to Unicode in version 3.2 to
render text symbols as emojis;
just in time for IDNA2003.
As a consequence, ❤️ 💙
.com
was once valid while .com never was.
Emojis were
intentionally disallowed in IDNA2008 because humans likely confuse different emojis
even without combining characters,
such as skin
tones and hair styles.
For example, ❤
and ♥️
are two different hearts, where both of them were
valid under IDNA2003.
Email explained from first principles 74
/
140 ef1p.com/email on 2021-10-21
German eszett ß:
In IDNA2003, ß was case-folded to ss.
For example, Gießen.de was transformed to giessen.de before making the DNS
lookup.
Since ß is allowed in IDNA2008, Gießen.de is now transformed to xn--gieen-nqa.de.
ς
Greek sigma :
Similarly, ς was case-folded to σ in IDNA2003 but is now allowed in IDNA2008.
For example, ἑλλάς.gr was transformed
to xn--hxa3aa7a0420a.gr in IDNA2003
and is now transformed to xn--hxa3aa3a0982a.gr in IDNA2008.
IDNA2008 validation
Homograph attack
Domain names which look identical but resolve to different addresses are a serious security issue.
For example, the lowercase letter l, the
uppercase letter I, and the number 1
can easily be mistaken for one another depending on the font,
and so can the capital letter O and the
number 0.
While the problem already existed with ASCII-only domain names,
internationalized domain names made the situation
considerably worse.
For example, the Latin B,
the Greek Β,
and the Cyrillic В all look the same.
While BBC.com takes you to the website of the
British Broadcasting Corporation (BBC),
ВВС .com takes you to a completely different website.
Deceiving users with optically similar
characters in order to obtain sensitive information
is known as a homograph attack.
While phishing cannot be fully eliminated,
such attacks
can be mitigated by the client, the registry, and the user:
Client:
Browsers and mail clients should warn the user about suspicious domain names
and display such domain names in
Punycode/ASCII rather than Unicode.
Domain names are suspicious when they use characters which don’t belong to the user’s preferred
language
or when they mix characters from different scripts.
Additionally, it’s a good idea to lowercase and normalize domain names
before displaying them
in a font which clearly distinguishes between visually similar characters.
Registry:
Domain name registries should develop registration policies
for their top-level domains.
Registries are free to permit characters
рф
only from certain
scripts
or not to support internationalized domain names at all.
For example, the Russian top-level domain .
permits
only subdomains in the Cyrillic script.
Registries which allow the use of different scripts should ensure
that the different scripts cannot be
mixed in a single label.
The Unicode Technical Standard 39
with its data set
contains more information about confusable characters.
On
top of this, registries should bundle or block variants of the same word
as outlined in RFC 4290.
Wikipedia lists which top-level domains
support IDNs
and which top-level domains are internationalized themselves.
User:
Users should be trained to recognize phishing attempts
and to always enter the address of important online services themselves
instead of following a link.
In the above example, the fact that ВВС.com looks just like BBC.com is not a problem
if users enter the perceived
address into the address field rather than copying it there.
So far, we have seen how non-ASCII characters can be encoded in the message body,
in header fields and in domain names.
The only thing
that is missing is the internationalization
of the local part of email addresses.
This is achieved by the following RFCs, which extend the email
protocols
and the message format to allow Unicode characters encoded in UTF-8 everywhere:
RFC 6530
introduces the framework for internationalized email.
It explains the problem and
defines the used terminology.
Unlike earlier
proposals,
internationalized messages
are no longer downgraded in transit
because the local part of an address is to be interpreted
only
by the host specified in the domain part of the address.
If an intermediary mail server doesn’t support UTF-8,
the message has to be
returned to the sender.
If an internationalized message shall be delivered to legacy mail servers,
it has to be downgraded before or during
RFC 6531
defines an SMTP extension with the keyword SMTPUTF8.
If the SMTP server indicates this capability,
the SMTP client can
transfer a UTF-8 message with UTF-8 envelope addresses
by using the MAIL FROM command with the SMTPUTF8 parameter.
This RFC also
defines additional protocol types,
which can be used in the with clause of Received header fields.
RFC 6532
extends the syntax rules of RFC 5322
to allow the use of UTF-8 characters everywhere.
It also introduces an additional content
type
with the identifier message/global to describe internationalized messages encoded in UTF-8.
RFC 6533
brings UTF-8 to delivery status notifications (DSN), such as non-delivery reports (NDR).
RFC 6855
specifies an IMAP extension
which allows mail clients to access internationalized messages
(and to use Unicode characters in
folder names).
The UTF8=ACCEPT capability
indicates that the IMAP server supports UTF-8 in strings.
The UTF8=ONLY capability
indicates
that the IMAP server requires UTF-8 support from clients
because it won’t downgrade internationalized messages for them.
The
UTF8=ONLY capability implies the UTF8=ACCEPT capability
and clients have to indicate that they can handle UTF-8
by sending . ENABLE
UTF8=ACCEPT to the server.
RFC 6856
specifies a POP3 extension
to upgrade an ASCII-only session to an UTF-8 session.
The POP3 server indicates that it supports
UTF-8 with the
UTF8 capability.
A POP3 client can then enable the UTF-8 mode with the
UTF8 command.
This RFC also introduces a LANG
capability and command,
which allows the client to configure a different language for the response texts.
This can be useful when the
client presents error messages from the server directly to the user.
RFC 6857
specifies an advanced downgrading mechanism for internationalized messages.
POP3 and IMAP servers can use it to convert
UTF-8 messages to ASCII-only messages
before delivering them to mail clients which don’t support UTF-8.
The conversion is relatively
straightforward:
Everywhere where the Encoded-Word encoding is allowed,
this encoding is used to encode UTF-8 strings as ASCII
strings.
The Encoded-Word encoding is also used if necessary for
unknown header fields.
Internationalized domain names are
downgraded
with the Punycode encoding.
Email addresses with non-ASCII characters in the local part
are rewritten by encoding the
whole address as an Encoded Word
and replacing the address with an empty group construct.
For example, From: José
<josé@example.com> is converted to
From: =?UTF-8?Q?Jos=C3=A9_?= =?UTF-8?Q?jos=C3=A9=40example=2Ecom?= :;
thanks to RFC
6854.
Since this string encodes an empty group instead of an address,
the recipient cannot reply to such a message without manual
intervention.
RFC 6857 requires the use of UTF-8 as the character encoding
and RFC 2047 requires that
the @ symbol and the period are
also encoded when the Encoded Word precedes an address.
If the internationalized email address is part of an address group,
the whole
group is encoded with this technique because groups cannot be nested.
Header fields in which addresses are used but the group syntax is
not allowed
need to be encapsulated:
A header field such as Message-Id is replaced with Downgraded-Message-Id
so that its value can be
encoded as an Encoded Word.
The Received header fields are an exception to this rule:
Any clauses with non-ASCII characters are simply
removed.
Lastly, the message body is left as is,
even if the content transfer encoding is 8bit.
RFC 6858
specifies a simpler downgrading mechanism for internationalized messages,
which accepts the loss of information in favor of an
easier implementation.
Internationalized email addresses are replaced with an
invalid address,
such as
[email protected].
The original address can optionally be encoded in the display name of the invalid address.
The
subject field is encoded as an Encoded Word,
and all other header fields with non-ASCII characters are simply removed.
This RFC also
extends IMAP
so that the server can indicate to the client which messages were downgraded.
In order to prevent permanent loss of
information,
mail clients shouldn’t remove the internationalized message on the server.
Automatically removing retrieved messages on
the server is especially common
among POP3 clients.
Another problem is that clients often cache messages indefinitely.
Even if the client
is upgraded to support internationalized messages,
it likely still accesses the downgraded messages from the local message store.
Last but
not least, downgrading message header fields invalidates DKIM signatures.
Content type
Now that we can encode arbitrary content,
we need a way to inform the client how to interpret the decoded content.
This is done with the
Content-Type header field,
which has the following format:
Type: The primary content type describes the general type of data.
If the client doesn’t recognize the subtype,
it can use this information to
decide what to do with the content.
If the type is text,
for example, it can display the raw data to the user,
which wouldn’t make sense for
binary files.
The other top-level media types are image,
audio,
video,
font,
model for three-dimensional models,
application for application-
specific formats,
message for email messages,
multipart for multipart messages,
and example for use in documentation.
The type, the subtype, and the parameter names are case-insensitive.
RFC 6838 doesn’t specify whether the tree and the suffix are also case-
insensitive
but I assume that this is the case.
Whether a parameter value is case sensitive depends on the parameter.
The default content type for
emails
is text/plain; charset=us-ascii.
As specified in RFC 1945,
HTTP uses the same header field with the same media types.
Enriched Text
MIME-Version: 1.0
<bold>Roses</bold> <italic>are</italic>
<color><param>red</param>red</color>.
This data format has mostly been superseded by HTML and is not widely supported.
Apple Mail strips all the tags and displays the text
without formatting.
Gmail doesn’t recognize the format and offers the option to download the content instead.
Only Thunderbird displays
the text with formatting, but it doesn’t support the <color> tag.
HTML emails
Nowadays, most messages are formatted with the Hypertext Markup Language (HTML).
The text/html media type is specified in RFC 2854.
The message from the previous box looks as follows when it is formatted with HTML:
MIME-Version: 1.0
<html>
<body>
<b>Roses</b> <i>are</i>
<span style="color:red;">red</span>.
</body>
</html>
Email styling
<html>
<head>
</head>
<body>
<span>Hello,</span>
<span>World!</span>
</body>
</html>
<html>
<head>
<style type="text/css">
span {
color: red;
</style>
</head>
<body>
<span>Hello,</span>
<span>World!</span>
</body>
</html>
<html>
<body>
</body>
</html>
The problem with HTML and CSS in emails is that the support for them varies a lot among mail clients.
As a sender, you want to make sure
that your message is displayed as intended for most of your recipients.
This forces you to use only features which are supported by most mail
clients.
While some mail clients support external CSS,
many do not.
And while many mail clients support internal CSS
by now, some do not.
For this reason, most HTML emails are still sent with inline CSS,
which is supported by all mail clients that can display HTML emails.
By the
way, you don’t have to inline the CSS manually,
there are tools for that.
<html>
<head>
<style type="text/css">
a { color: red; }
</style>
<body>
</body>
</html>
When you send the above message to your Gmail account and view the message in your browser,
you’ll see that the link is colored in Gmail’s
default blue.
If you inspect the <a> element with the
developer tools
of your browser, you’ll see why:
color: #15c;
.msg443033220466499195 a {
color: red;
The styles that Gmail applies to the link in the above message.
<html>
<head>
<style type="text/css">
</style>
</head>
<body>
<div id="body">
</div>
</body>
</html>
Email markup
Gmail,
Mail.ru, and
Yahoo Mail
support interactive messages based on
Accelerated Mobile Pages (AMP).
AMP was initiated by Google in
order to make websites load faster on mobile.
It achieves this by providing built-in components
and making websites cacheable so that they
can be served
through a content delivery network (CDN).
While AMP for websites supports
custom JavaScript,
you can use only the default
library
when using AMP for emails.
It’s basically a whitelisted web framework.
As with email markup, you have to
register yourself as a
sender
with the mailbox providers before they display your dynamic content to their users.
AMP messages must be authenticated and
must
contain an ordinary HTML or plaintext version of the same content,
which is displayed when the mail client is offline or
30 days after
receiving the message.
The sending mail client can either insert soft line breaks
after existing spaces or insert the preceding space as well.
It indicates the former
behavior by adding the content type parameter delsp=no.
If the mail client also inserted the preceding space, it adds delsp=yes.
In the
former case, the receiving mail client replaces SP+CR+LF with SP.
In the latter case, the receiving mail client simply removes all occurrences
of SP+CR+LF.
This is useful for content which doesn’t use the ASCII space character.
MIME-Version:␣1.0↵
Content-Transfer-Encoding:␣7bit↵
Content-Type:␣text/plain;␣charset=us-ascii;␣format=flowed;␣delsp=no↵
2␣+␣2␣↵
␣>␣3
The standard for format=flowed is a bit more complicated than how I explained it.
On the one hand, a space can be inserted at the beginning
of any line,
which means that lines which already start with a space have to be protected with an additional space.
(For historical reasons,
lines which start with From also have to be protected by inserting a leading space.)
On the other hand, it also specifies how to handle
consecutive lines
with different quote levels,
which always lead to hard line breaks.
As far as I can tell, most messages are encoded with either Quoted-Printable or Base64,
which have their own ways of handling client-
inserted newlines.
Most users never encounter the line-length limitation of email:
They see neither unwanted newlines nor a horizontal
scrollbar
because a message is displayed as format=fixed when it should be flowed.
There are exceptions, though.
For example, Thunderbird
breaks lines in the compose window by default,
even though it uses format=flowed correctly and displays such messages correctly
(i.e.
flowed according to the width of your screen).
The only fix I found for this annoyance is to set mailnews.wraplength to 0
in Thunderbird’s
config editor.
If a line becomes longer than 1’000 characters,
Thunderbird then decides to encode the message with Base64
rather than to
apply format=flowed as it would otherwise.
Message compression
HTTP/1.1 200 OK
Content-Encoding: gzip
The only standardized way to compress emails during relay is with S/MIME,
which we’ll discuss later.
Section 3.6 of RFC 8551
provides the
following example for how to use S/MIME for compression only:
Content-Transfer-Encoding: base64
eNoLycgsVgCi4vzcVIXixNyCnFSF5Py8ktS8Ej0AlCkKVA==
You can decompress this example message with the following commands.
pigz stands for Parallel Implementation of GZip,
and it can be
installed on macOS with brew install pigz if you have Homebrew.
Using gzip -dc instead of pigz -d doesn’t work because
gzip doesn’t
recognize the compression format
if the file doesn’t start with specific bytes.
Since mail clients have no (standardized) way to advertise their capabilities to other mail clients,
we won’t see compression of messages from
the mail client of the sender
to the mail client of the recipient anytime soon.
This doesn’t prevent mail clients and mail servers from
compressing messages between them, though,
as they can advertise their capabilities as part of the used protocol.
RFC 4978 specifies the
COMPRESS extension
for IMAP,
which allows the client and the server to agree on compressing their communication.
Since mail clients can
store messages in whatever format they want,
compressing locally stored emails is by far the lowest hanging fruit.
However, neither
Thunderbird nor Apple Mail
compresses the messages which it stores locally.
Multipart messages
Now that we can send arbitrary files via email,
we can design file formats
to include several files in a single message body.
RFC 2046 defines
various
content types to split a message into multiple parts.
What all the multipart formats have in common is that they are
text-based.
This
means that the various parts have to be separated with a character sequence
which may not appear in any of the parts themselves.
The character
sequence is chosen by the sending mail client for each message
and provided to the recipient in a content-type parameter called boundary.
Let’s
look at the two most common multipart types
and leave the rest for the boxes below.
multipart/mixed
bundles independent parts into a single message.
This content type is used to attach files to a message.
If a client doesn’t
recognize a multipart subtype,
it should treat the content as multipart/mixed
and show the recognized parts.
MIME-Version: 1.0
--UniqueBoundary
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
--UniqueBoundary
Content-Type: image/png
Content-Transfer-Encoding: base64
iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAYAAACNiR0NAAAACXBIWXMAAAsSAAAL
EgHS3X78AAAB4klEQVQ4y5VVwU7CQBDdKl5IVz8Fg95ZQH+BhBP/UEAxKv/AN3jG
/1ATPeqlJr169WRpi/NmZ8uCEGqTl53OzrzO7MxOVdiKFB6s2gwDkeuEAeGRkBAW
I/xgBRmtBSFHpPS+0O0rRzzz/JWf5kxf3CKSQneusea8whGyjbIQ4kI+lMHHkTou
TpMjA5kZpoQpoUnoEeJ1UiKzdiDKmdRG2ldeAWI+HxvZvfIeem9wihKhO09EKWuG
D4KDC4WKITpJAUbnQnREugOR3+RjWVkgkMkR8AdtlAMQzj1C4ExIHNkh4XkbIaff
YlJHOAdhos3IpbCN8IDw4hHmshZeoXLpjERJG7jNfYSpd9ZloUIj52mixT8IqdId
rvYX4bXsxVVLlYRVUn46vryD0wPhRPSnrqVC+Hkp7ytKwFU2w29CKK1Wk70e0seN
8otSpW0+CO9MZqIa9kSP5s85QssxqNrYLvrGxt7UXs/xqrGrXb2xmzqx6Jpik6IX
Jbq+8i90heGQs+zvwXZzOFQdX+7ess5EmZ2Nk7/ja/+AHXkDdiQDduLObPeA3fEL
mMvYTwWJ6Hb+An4Bgrjq/fe5+zgAAAAASUVORK5CYII=
--UniqueBoundary--
multipart/alternative
bundles alternative versions of the same content into a single message.
This content type is used to provide a
fallback version of the content
for mail clients that don’t support the preferred content type.
The versions are to be listed in increasing order of
preference,
which means that the preferred format comes last.
This has the advantage that users of mail clients
which don’t support multipart
messages see the simplest version of the message first.
Mail clients usually display the last part which has a content type that they support
unless the user configured a different preference.
multipart/alternative is most commonly used to provide a plaintext version of HTML
messages
for users of text-based mail clients,
such as Elm,
Pine,
and Mutt,
which cannot render HTML.
To give you another example,
I could
have included a plaintext version of the Enriched-Text message
so that Gmail could display that instead of offering me to download the
unrecognized content.
MIME-Version: 1.0
--UniqueBoundary
Content-Type: text/plain
--UniqueBoundary
Content-Type: text/enriched
<bold>Roses</bold> <italic>are</italic>
<color><param>red</param>red</color>.
--UniqueBoundary
Content-Type: text/html
<html>
<body>
<b>Roses</b> <i>are</i>
<span style="color:red;">red</span>.
</body>
</html>
--UniqueBoundary--
Since multipart/mixed and multipart/alternative are content types like any other, they can be nested,
which results in a tree of message
parts.
The content encoding of multipart parts
has to be 7bit, 8bit, or binary,
and the boundary between the inner parts
has to be different
from the boundary between the outer parts.
Boundary delimiter
--UniqueBoundary
--UniqueBoundary
--UniqueBoundary--
Each part starts with zero or more Content-* header fields followed by an empty line and the content of that part.
The sending mail client
has to make sure that the boundary delimiter line doesn’t appear in the embedded content.
Due to the leading two hyphens, the delimiter
cannot appear in Base64-encoded content.
By including =_ in the boundary value,
the delimiter also cannot appear in Quoted-Printable-
encoded content.
The rest of the boundary value is usually chosen randomly.
Apple Mail, for example, chooses Apple-Mail=_ followed by a
universally unique identifier (UUID) as the boundary value.
Content disposition
If the example from the previous box is not blocked by your spam filter,
your mail client likely displays the content of the second part below
the content of the first part.
If you want some parts of a message to be displayed as a file,
which the user has to open to see its content,
you
can indicate this with Content-Disposition: attachment.
The Content-Disposition header field is specified in RFC 2183.
Besides asking
mail clients to display a MIME part as an attachment,
you can also ask them to display its content inline,
i.e. visible between the other parts.
With the filename parameter,
the sender can suggest a filename for when the recipient wants to store the part in a separate file.
If the
filename includes non-ASCII characters,
it has to be encoded with the Extended-Parameter encoding.
The receiving mail client should make
sure that the filename conforms to local filesystem conventions
and that no file is overwritten without user consent when saving the
attachment.
The receiving mail client should also ignore any
path delimiters in the filename.
Let’s look at an example:
MIME-Version: 1.0
--UniqueBoundary
--UniqueBoundary
#include <stdio.h>
int main() {
printf("Hello, World!");
return 0;
--UniqueBoundary--
Aggregate documents
MIME-Version: 1.0
--UniqueBoundary
--InnerBoundary
https://ef1p.com
--InnerBoundary
<html>
<body>
</a>
</body>
</html>
--InnerBoundary--
--UniqueBoundary
Content-Type: image/png
Content-ID: <[email protected]>
Content-Transfer-Encoding: base64
iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAYAAACNiR0NAAAACXBIWXMAAAsSAAAL
/1ATPeqlJr169WRpi/NmZ8uCEGqTl53OzrzO7MxOVdiKFB6s2gwDkeuEAeGRkBAW
gkR02KvDFj4+hxMCpUq5T4h1d7LU3Zulbo+X5GQBGToC2XzC1vML1lmtPGMiciQ5
I/xgBRmtBSFHpPS+0O0rRzzz/JWf5kxf3CKSQneusea8whGyjbIQ4kI+lMHHkTou
TpMjA5kZpoQpoUnoEeJ1UiKzdiDKmdRG2ldeAWI+HxvZvfIeem9wihKhO09EKWuG
D4KDC4WKITpJAUbnQnREugOR3+RjWVkgkMkR8AdtlAMQzj1C4ExIHNkh4XkbIaff
YlJHOAdhos3IpbCN8IDw4hHmshZeoXLpjERJG7jNfYSpd9ZloUIj52mixT8IqdId
rvYX4bXsxVVLlYRVUn46vryD0wPhRPSnrqVC+Hkp7ytKwFU2w29CKK1Wk70e0seN
8otSpW0+CO9MZqIa9kSP5s85QssxqNrYLvrGxt7UXs/xqrGrXb2xmzqx6Jpik6IX
Jbq+8i90heGQs+zvwXZzOFQdX+7ess5EmZ2Nk7/ja/+AHXkDdiQDduLObPeA3fEL
mMvYTwWJ6Hb+An4Bgrjq/fe5+zgAAAAASUVORK5CYII=
--UniqueBoundary--
Relative Content-Location:
Use <img src="logo.png"> in the HTML part and Content-Location: logo.png in the image part.
Apple
Mail, Outlook.com, and Yahoo Mail fail to display the message correctly.
Only Gmail and Thunderbird implement this part of the RFC.
Absolute Content-Location:
Use <img src="https://ef1p.com/logo.png"> in the HTML part
and Content-Location:
https://ef1p.com/logo.png in the image part.
Only the same mail clients as before display the message correctly.
One-click unsubscribe
If you are subscribed to a mailing list,
you may want to unsubscribe from the list after having received a message you no longer want to receive.
Most mailing lists include a link at the bottom of each sent message,
which you can click to unsubscribe from the mailing list.
Since this is a link like
any other in the message, a browser window is opened
and you might have to click on additional buttons there to finally unsubscribe from the list.
This can be a bit of a hassle, especially on mobile phones.
Fortunately, RFC 2369 specifies an easier way to achieve the same.
Mailing lists should
include a List-Unsubscribe header field
so that mail clients can provide a uniform unsubscribe experience across mailing lists:
You simply click
on “Unsubscribe” and your mail client takes care of the rest.
List-Unsubscribe: <https://example.com/unsubscribe?token=XYZ>,
<mailto:[email protected]?subject=Unsubscribe>
The List-Unsubscribe header field provides a standardized way to unsubscribe from a mailing list.
If there are several options in angle brackets, the mail client should use the first one that it supports.
The body of the POST request is encoded with the content type
multipart/form-data as specified in RFC 7578
or application/x-www-form-
urlencoded as specified by the
Web Hypertext Application Technology Working Group (WHATWG)
in their URL spec.
The request has to be sent
without context information such as cookies.
The user has to be authenticated with a token in the URL.
Host: example.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 26
List-Unsubscribe=One-Click
These two header fields are not only convenient for users,
they also make unsubscribing more secure since mail clients don’t include them when
forwarding a message.
If you want to prevent others from unsubscribing you from a mailing list,
you have to remove the unsubscribe link at the
bottom of a message yourself before forwarding the message.
Issues
Email is both a blessing and a curse.
On the one hand, email is by far the most important decentralized messaging service that we have,
which
should be reason enough to cherish it.
The only other decentralized messaging service which comes close to email in terms of ubiquity
is the Short
Message Service (SMS).
On the other hand, email has become so dysfunctional that many of us would like to leave it behind.
In this section, we’ll
look at the issues that plague modern email.
In the last chapter,
we’ll discuss how some of the security-related issues are being addressed.
Spam
Unsolicited messages which are sent in large quantities are called
spam or
junk mail.
Spam is a brand of canned pork, which was introduced in
1937.
Spam is likely an abbreviation for spiced ham.
It became ubiquitous during and after World War II when food was rationed.
The British
comedy group Monty Python
made fun of this fact in a famous sketch in 1970.
The term got adopted to refer to undesirable things which come in
excessive quantities – including junk mail.
Any messaging service which is popular, open, and free will have spam sooner or later.
Thus, spam isn’t a result of the shortcomings of email but
rather a consequence of its desirable properties.
Since unsolicited messages are annoying,
people try to eliminate junk mail from their inboxes
with
heuristics, blacklists, and challenges.
While such techniques make spam bearable,
they don’t solve the underlying problem of unsolicited
mail:
Anyone in the world can add tasks to the to-do list which is your inbox.
In my opinion, mail clients should separate messages from
unapproved senders from your inbox
so that the messages you actually want to receive don’t drown in the noise.
This is similar to how I almost
never accept calls from numbers that I haven’t stored in my phone.
Even though this feature has to be tremendously useful for anyone who
doesn’t want to be bothered
by random sales people and their never-ending followups,
HEY is the only mail client I know of
which lets you screen
your email senders.
And just like I block call centers, I also block email senders, of course.
However, the default shouldn’t be “allowed unless
blocked”
but rather “blocked unless allowed”.
Additionally, messages are typically blocked on the client-side
because most mail clients still don’t
support server-side filtering.
Blacklists
As long as spammers send their unsolicited messages from the same sources,
you can get rid of all their junk simply by blocking all traffic
from these sources.
Historically, lists of blocked addresses are known as blacklists
and lists of allowed addresses as whitelists.
Since some
consider the positive and negative connotations of white and black to be racially charged,
the IT industry is moving to replace these terms
with block or deny list and allow or pass list,
even though the traditional terms likely predate attribution to race.
In the spirit of making the IT
industry and our societies more inclusive, I welcome these changes.
The main reason why I stuck with the old terms is the next box:
The anti-
spam technique is known only as graylisting.
If I were to speak of temporarily-reject listing, many would have no idea what I’m talking about.
While block lists are already useful when every provider maintains their own list,
they are much more powerful when they are shared among
mailbox providers.
The best-known maintainer of block lists is The Spamhaus Project.
Before you try to relay email directly with the ESMTP
tool,
you can use the Spamhaus IP and Domain Reputation Checker
to determine whether your IP address is blocked.
If you use and misuse
the ESMTP tool a lot, your address may get listed there.
Once your address is on their block list, your chances of relaying emails successfully
dwindle.
Block lists are fed by spam filters, which in turn are trained by users, who mark unwanted messages as spam.
Another way to
identify spammers is to set up a honeypot:
An email server or email address which is positioned to attract spammers
but unlikely to be
contacted by legitimate parties.
Anyone who takes the bait can be blocked.
Graylisting
Patience
Challenges
Spamming is an economic activity, and economic activities are worthwhile only if the benefits are higher than the costs.
The reason why
there is so much spam is because
the marginal cost of sending an email is almost zero.
If we increase the cost of sending emails by just a little
bit, most spammers would go out of business.
There are two ways to increase the cost of sending emails in large quantities:
You can require
manual intervention from unknown senders
or force them to waste a costly resource such as electricity, bandwidth, or memory.
In both
cases, your incoming mail server or your mail client would send a puzzle to the sender,
who needs to solve it in order for their email to be
delivered to your inbox.
To require attention from a human,
you can send them a CAPTCHA.
To require attention from a machine,
you can ask
it for proof of its work.
Since your challenge is an automatic response,
care needs to be taken to avoid mail loops.
The main disadvantage is
that you confirm the existence of your email address to anyone
(unless your mail server sends a challenge for existent and non-existent
recipients).
Given that you now have an effective system against spam and that many email addresses
are public anyway, this shouldn’t be a
problem in practice.
To be honest, I’m surprised that this old and simple idea isn’t being used more often.
The difficulty of the puzzle could
depend on the likelihood of the message being spam.
Companies such as online platforms would have to ask their users to whitelist their
domain,
let them recover the first message from the spam folder – or employ people to solve the puzzles.
Reputation
The origin of a message plays a crucial role when assessing its trustworthiness.
In the absence of domain authentication,
it’s mostly the
reputation of the sender’s IP address,
which determines whether a message gets delivered.
Once emails can no longer be spoofed,
the
reputation of the sender’s domain will also become important.
When you deliver emails from a new IP address,
they will likely land in the
spam folder of their recipients
even if you follow all best practices.
Your IP address just isn’t known yet to deliver emails that users want to
receive.
A good reputation takes time to build,
which makes it quite difficult to run your outgoing mail server yourself.
Especially as a
company, you want all your customers to receive all your emails.
In order to achieve a high delivery rate (also called good deliverability), you
typically buy
into the reputation capital of another company.
A whole industry evolved around just this value proposition.
Such companies
are known as email service providers
or email delivery vendors,
and they offer a transactional email service.
The downside of this reputation
system is that email is no longer really an open service
if you have to purchase the qualification to send messages from another company.
The
upside of this system is that companies are incentivized to protect their reputation:
They rather want to make it as easy as possible for
readers to
unsubscribe from their newsletter
than to risk being flagged as spam.
Address munging
One of the best ways to avoid getting spammed is to keep your email address as private as possible.
Unfortunately, a single hack of one of
your service providers is enough to expose your address permanently.
Spammers harvest email addresses in various ways.
Besides
purchasing lists of email addresses on the black market
from hackers and other spammers, they use programs,
so-called crawlers,
which
search the Web for email addresses.
In an attempt to prevent their address from being collected,
people often disguise their address when
publishing it online.
For example, instead of writing [email protected], they write user[at]example.com and the like.
This practice is called
address munging
and it is most effective if the particular technique is rarely used or difficult to revert.
Unless the address obfuscation is
reverted in the browser with JavaScript,
readers who want to contact the person cannot simply click on a mailto link.
Other approaches such
as encoding email addresses as images instead of text
reduces the accessibility for people
who rely on screen readers.
Legal requirements
Privacy
If you send an email to someone, you want to share certain information with that person.
Mail clients and mail servers, however, share a lot more
information than what the users intended to share.
In this subsection, I list all the subtle information disclosures that users likely aren’t aware of.
If
you know of other privacy leaks, please let me know.
If you don’t want your mailbox provider to leak your own IP address,
you can use a Virtual Private Network (VPN)
or an overlay network for
anonymous communication,
such as Tor.
Alternatively, you can use a mailbox provider which values your privacy,
such as ProtonMail
or
Tutanota.
Sending messages from the web interface of a mailbox provider usually also helps.
For example, if you compose an email on
gmail.com,
your IP address is not included in the outgoing message.
If you submit a message from your desktop client to Gmail using SMTP,
on
the other hand, your IP address is added by smtp.gmail.com in a Received header field.
While RFC 5321 says that the IP address of the source
should be included in the Received header field during message relay,
mailbox providers should ignore this instruction during message
submission, in my opinion.
I understand that mailbox providers may want to record the IP address of the sender
to prevent abuse of their
service,
I just see no reason to share this information with the recipients of a message.
In fact, it might even be illegal to do so.
Many privacy
acts, such as the European
General Data Protection Regulation (GDPR),
forbid service providers to share personal data without the user’s
explicit consent.
Since the third party with whom the personal data is being shared can be different for every email,
the user’s consent would
be required every time they send an email.
If you’re a lawyer and you think that this reasoning has some merit,
let me know so that we can file a
class-action lawsuit
to bring this industry practice to an end.
Device name: Mail servers also include the client’s argument to the EHLO command in the Received header field.
RFC 5321 requires that the
client uses its
fully qualified domain name
if it has one or its IP address otherwise.
In spite of this, Thunderbird and maybe other clients
use the
name of your device in the local network as the argument.
On macOS, you find the name of your device in the “Sharing” tab of your “System
Preferences”.
By default, it starts with the first name of your user account.
In my case, my computer is reachable under Kaspars-MacBook-
Pro.local in the local network.
As a whistleblower, I might create a new email address
and even use an anonymization service, such as Tor, just
to have my mail client and mail server leak my real name.
RFC 5321 even warns about exactly this problem.
I reported this privacy bug
to
Mozilla Thunderbird on 2 December 2020.
Until a fix is available, you can set the mail.smtpserver.default.hello_argument option in the
config editor to [192.168.1.1].
Such a value is typical for the vast majority of people
due to network address translation (NAT).
Timezone: The sent date is usually encoded in the timezone of the sender.
By looking at the offset from the Greenwich Mean Time (GMT),
the
recipient learns from which longitude a message was sent.
In my opinion, mail clients should always encode the Date field in Greenwich Mean
Time.
Mail client: Many mail clients put their name with their current version
into a User-Agent or X-Mailer header field.
Some mail clients even
include the name and the version of the operating system on which they run.
While such data is usually harmless,
it can provide valuable
information to someone who wants to attack you.
Given the intricacies of email, mail clients can also be identified by
how they delimit parts,
how they label files,
how they style messages,
how they quote messages, and so on.
This is known as fingerprinting,
and it allows a recipient to
determine how likely separate messages were sent from the same client.
Display names: Your mail client not just adds your name
as a display name in the From address,
it also adds a display name for each recipient it
knows.
This can leak how you’ve stored a recipient in your address book
(i.e. be careful under what name you store the colleague you’re having
Remote content
HTML emails can include remote content,
which is fetched by the mail client when it renders the message.
Images are by far the most common
type of remote content.
They are usually included with the <img> element
or with the background-image property.
Some mail clients support
external style sheets
through the <link> element,
but internal CSS can also have
@import statements
to load Web fonts
and other styles with the
url() function.
There are other elements, such as <audio>,
<video>, and
<iframe>,
which can also be used to include remote content, but not all
mail clients support them.
Mail Web
server server
Mail
client
In my opinion, remote content should never have been supported by mail clients.
If people insist on incorporating related files into a message,
they
can use aggregate documents.
Now that remote content is used so widely,
we have to live with the above drawbacks.
Google and
Yahoo
proxy all remote content in their webmail clients.
Instead of letting your browser fetch the remote content directly,
these
companies fetch the remote content on your behalf.
The advantage of this approach is that your IP address no longer leaks to the sender of a
message.
Unfortunately, Google and Yahoo fetch the remote content only when you open the email.
Thus, you still let the sender know that
you’ve opened the email and when you’ve opened it.
Google seems to cache the external resources for some time.
Yahoo, on the other hand,
makes another request if you force your browser to reload all content.
In order to fully protect the privacy of their users,
these companies
would have to fetch and cache all remote content as soon as they receive the email.
In order not to confirm to the sender which addresses
exist,
they would have to do this for all incoming messages,
even the ones with inexistent recipients
and the ones which are discarded as
spam.
Beyond fetching static content, Google also
proxies the requests
triggered by dynamic content.
Since webmail providers have access
to your emails anyway,
there is no privacy drawback when these companies fetch the remote content for you.
Desktop clients, on the other
hand, can fetch emails from any mailbox provider.
If a desktop client were to use a proxy server
which is operated by the publisher of the
client,
the publisher would learn from which server you fetch the remote content
even if the communication is encrypted end-to-end
like in a
virtual private network (VPN).
The best solution would be to fetch all remote content through the
Tor anonymity network.
Unfortunately, I
don’t know of any mail client which does this. 😞
3. Fetch
Mail Proxy content Web
server server server
2. Fetch content
1. Fetch email
Mail
client
If you care about your privacy, you should allow remote content only from trusted senders.
Here is how you disable remote content in
various mail clients
(where some of them have remote content disabled by default):
Gmail:
All settings > General > Images > Ask before displaying external images
Yahoo:
More settings > Viewing email > Show images in messages > Ask before showing external images
Thunderbird:
Preferences > Privacy & Security > Mail Content > Allow remote content in messages [disabled by default]
Apple Mail:
Mobile: Settings > Mail > Messages > Load Remote Images [toggle the switch]
Desktop:
Preferences > Viewing > Load remote content in messages [remove the tick]
Outlook:
Web:
View all Outlook settings > General > External images > Always use the Outlook service to load images
(As far as I can tell, you cannot disable remote content and the proxy service doesn’t work for me.)
Desktop:
File > Options > Trust Center > Trust Center Settings or Automatic Download > Don’t download pictures automatically in
HTML e-mail messages or RSS items [enabled by default]
Link tracking
Emails often contain links to websites.
Instead of linking to the target site directly,
the sender can rewrite the link in such a way
that your web
browser sends a request to their tracking server,
which in turn redirects your browser to the actual web server:
Security
Security and the lack thereof have been a topic throughout this article.
In this section, I shine a light on some additional aspects.
Spoofing
As we saw earlier,
the sender of an email can easily be spoofed
because at least historically emails aren’t authenticated.
Somewhat frustratingly,
RFC 5321
and some companies
see forged sender addresses more as a feature than as a bug.
Criminals abuse this “feature” to trick unsuspecting
users into performing actions or disclosing information,
which they wouldn’t do otherwise.
Exploiting the credulity of people is known as
social
engineering.
Besides impersonating a trusted organization for phishing,
a common attack is to send a victim an email which seemingly comes from
their own address.
In the message, the attacker claims that they’ve compromised the victim’s computer
and that they’ve recorded the victim
masturbating to porn.
The attacker threatens to send the recording to all the victim’s contacts unless they receive a payment,
usually in Bitcoin,
within a couple of days.
This form of blackmailing
is known as sextortion.
If you receive such an email yourself, how do you know that the
attacker’s claim is wrong?
First of all, you know now that the sender address of emails can easily be forged
and that there is no reason to assume
that your account has been compromised.
But more importantly, if there was an easy way to increase the fraction of people who pay the ransom,
criminals would certainly make use of it.
In the case of sextortion, they would just have to include a screenshot of the recording
and the addresses
of some contacts to make presumably the large majority of people pay.
Given that this is (usually) not the case, there’s no reason to worry.
Do
people fall for this crap? The answer is yes, unfortunately.
The first time I received such a message was on 13 January 2019.
The fraudster
demanded 356 Euro in Bitcoin to remain silent
and was stupid enough to provide the
same Bitcoin address to several victims.
Since all Bitcoin
transactions are public, we know exactly how much money they made:
5.379 BTC, which was worth around 20’000 USD at the time.
This also
means that they had no way to know who of their victims actually paid,
which made their threat even less credible to anyone who has a basic
understanding of
blockchains.
Phishing
Always be suspicious: If an email prompts you to perform a certain action, your alarm bells should ring.
Have you been prompted for similar
actions before?
Is the time frame to perform the action unusually short?
Is there a reasonable default option if you don’t perform the action?
Does the action involve the disclosure of sensitive information or a payment?
Don’t click on links: Phishing attacks require that you take the bait.
Create a bookmark
for all the websites where you have an account.
Make it
a habit to navigate to these websites yourself instead of following links.
If an email says that a subscription is about to expire,
log in to the
website of the service provider with the bookmark and not the link.
Using a bookmark (or a search engine) to navigate to a website
is better
than relying on the address autocompletion of your browser.
If you clicked on a dubious link by mistake in the past,
the fraudulent URL is still in
your browser’s history
and you may not be able to recognize it as such.
Hover over links: If you can’t suppress your urge to click on a link,
move your mouse over the link first and verify whether the status bar
at the
bottom of the window indeed displays the address you want to visit.
You should always do this because the text of a link can be misleading.
For
example, www.google.com takes you to
Bing, not Google.
You should check the destination of a link before you click on it.
If you check the
destination of a link only in the opened browser window,
you have already confirmed to the attacker that you click on links,
and the visited
website might have already infected your computer
with malware.
Unfortunately, link tracking can make it quite difficult to recognize
whether
the destination of a link is legitimate.
Furthermore, not all companies prime their users to trust only a single domain.
For example, PayPal, of all
companies,
directs their users to paypal-communication.com instead of paypal.com
when informing them about changes to the general
terms and conditions.
Additionally, homograph attacks can make it difficult
or even impossible to recognize that the target domain is not the
legitimate one.
This is one more reason why you shouldn’t click on links in the first place.
The only exception to this rule are links to articles on
which you won’t perform any actions.
However, this means that you have to remember for each tab of your browser
whether the address came
from a trusted or an untrusted source.
Anything you open on an untrusted page can also not be trusted.
Some mail clients, such as Apple Mail,
don’t have a status bar
and show the destination address in a tooltip instead.
And yes, Apple Mail is smart enough to override any tooltips that
a sender provided
with the title attribute.
I’ve tested this.
The very least that mail clients should do to prevent phishing attacks
is to show a warning if a known display name is used by an unknown
sender.
I have seen this only in the Gmail web interface so far.
For some reasons, I can no longer replicate this, though.
How Gmail warns its users when a known display name is used by an unknown sender.
Gmail provides an easy way to see whether a received message has been authenticated and encrypted in transit,
which allows users to assess the
authenticity and, somewhat misleadingly,
the confidentiality of a message at least after it has been transmitted:
You can click on the little triangle to see more details in Gmail’s web interface.
mailed-by indicates a successful SPF check, signed-by a valid DKIM signature.
security indicates that the outgoing mail server of the sender used STARTTLS.
If you reply to or forward the following message with “No, I don’t.” in Thunderbird,
the recipient will see “Yes, I do.” instead.
If you have
already disabled the composition of messages in HTML,
you have to press “Shift” when you click on the “Reply” or the “Forward” button.
For
this particular attack to work, the message has to be composed in the “Paragraph” style.
<html>
<head>
<style type="text/css">
p { font-size: 0; }
p + p { display: none; }
</style>
</head>
<body>
</body>
</html>
How to exploit Thunderbird’s failure to scope the styles of the quoted message.
Click here
to use this example in the ESMTP tool above.
The attack uses the ::before pseudo-element
to inject the text, and p[_moz_dirty] to hide the injected text during composition.
<html>
<head>
<style type="text/css">
hr { border: 0; }
</style>
</head>
<body>
</body>
</html>
How to exploit Outlook.com’s failure to scope the styles of the quoted message.
Click here
to use this example in the ESMTP tool above.
Since Outlook.com doesn’t copy styles like div[style]:before and div:first-child:before to the reply,
I had to abuse the <hr>
element
to make the injected text appear only once.
Different appearances
Another issue with email is that the same message can appear differently to different recipients.
This is a problem whenever you refer to the
content of an earlier message,
no matter whether you quote the message
or reference it in the In-Reply-To header field.
Until mail clients
address this issue, you must repeat the content you refer to.
Emails can appear differently for three reasons:
multipart/alternative:
Multipart messages can include different versions of the same content
so that the mail client of the recipient can
display the last version
whose content type it supports.
However, nothing guarantees that the various parts contain the same content.
Spam
filters might flag messages whose alternative parts diverge too much from one another,
but determining whether different parts contain the
same content is more difficult than it seems.
Let’s look at an example:
<html>
<body>
</body>
</html>
If your boss uses an HTML-capable mail client, they will see USD 100 in the message.
When your boss replies to this message with “Yes, that’s
what we agreed.”,
all the mail clients I usually mention in this article
generate a Content-Type: text/plain version of the reply, which
includes USD 1000.
If you know that your accountant uses a plaintext-only mail client, this attack will work.
On most HTML-capable mail
clients,
you can see the plaintext version only by inspecting the raw message.
Thunderbird, however, allows you to change which part is being
<html>
<head>
<style type="text/css">
</style>
</head>
<body>
<p>
You have a
<span class="large">large</span>
<span class="small">small</span>
<span class="touch">touch</span>
screen.
</p>
</body>
</html>
Media queries are useful to design websites for various screen sizes,
which is known as responsive web design.
Since emails are read on a wide
variety of devices,
media queries are an important technique to make them look good on all devices.
Since media queries and selectors
aren’t
allowed in the style attribute,
conditional rendering is much easier in mail clients which support internal or external CSS,
which is the vast
majority by now.
In order to prevent this attack,
Thunderbird no longer supports media queries.
In my opinion, this is the wrong approach
and
the fix should rather be to force all content to render.
Styles should affect only how content is displayed, not which content is being displayed.
The supported media features vary greatly among clients.
For example, the screen width media queries are supported by
Gmail, Outlook.com,
Yahoo Mail, and Apple Mail (also on iOS).
The pointer media query,
which can be used to detect a touch screen,
is removed by the Gmail and
Yahoo Mail webclients.
Different implementations:
As long as different users use different mail clients which sanitize emails differently,
attackers can draft messages
which are displayed differently to different recipients.
Since it’s easy to learn which mail client someone uses,
it’s often not difficult to have
some part of a message be shown or hidden for a specific recipient.
I’ve drafted such a message for you:
<html>
<head>
<style type="text/css">
@media (pointer) {
</style>
</head>
<body>
<p>
</p>
</body>
</html>
As long as not all mail clients prevent senders from hiding content with CSS,
email styling can be abused.
Don’t we have the same problem with
websites?
In principle, yes, but the difference lies in the expectation of users.
On the Web, you know that pages are often customized and that
their content can change at any moment.
In the case of email, however, you expect that everyone sees the same content,
especially when you
quote another message.
If you reply to messages without quoting them,
an attacker can deliver a different message with the same
Message-ID to
each of the recipients.
As I wrote earlier:
Just because someone is listed as another recipient
doesn’t mean that they received the same message
as you.
The abuse of conditional CSS rules as a signing oracle was discovered and published by
Jens Müller and his colleagues in 2019.
The
problem with diverging multipart/alternative parts was discussed thereafter
in this Thunderbird issue.
There are plenty of ways to hide text and other content with CSS.
While this is useful on the Web, where you can have dynamic content,
there is no reason to allow hidden content in emails,
where you can’t unhide it with JavaScript.
(I know that one can accomplish amazing
things with only CSS,
such as tabbed areas,
but do we really need this in emails?)
Jens Müller and his co-authors included the following table
of content-hiding CSS properties
in their paper,
which I simplified and extended for you:
Property Value(s)
display none
font[-size] 0 [Helvetica] (also when combined with a distance unit or the percentage sign)
color transparent, rgba(0,0,0,0), hsla(0,0%,0%,0) (for all RGB and HSL values)
opacity 0
clip[-path] circle(0) (and other shapes that don’t overlap the content)
ASCII-only characters:
Email is older than ISO 8859
and Unicode.
To remain backward compatible,
non-ASCII characters have to be encoded
in the message body,
in header fields, in domain names,
in parameter values, and in URLs.
To make things even more complicated, all these
encodings are different.
When the involved servers support SMTPUTF8,
UTF-8 can be used in the local part of an email address,
but
internationalized messages have to be downgraded
for clients which don’t support UTF-8.
No submission protocol:
In the early years of email, mail clients could submit
outgoing messages to any mail server without authentication.
As
a consequence, mail submission and mail retrieval were handled completely differently.
In order to make the change for existing mail clients as
small as possible,
the mail submission protocol was forked from ESMTP
rather than being incorporated into access protocols,
such as POP3
and IMAP.
Unless their mailbox provider is in a configuration database,
users have to configure both their
incoming mail server and their
outgoing mail server to this day.
If they change their passwords, they usually have to enter it twice in their settings.
Furthermore, it can happen
that they can receive messages but cannot send them and vice versa.
For ordinary users, this is really confusing.
The distinction between
incoming mail server and outgoing mail server
is also the reason why messages have to be submitted twice
if you want to record the sent
messages in your mailbox.
After two decades of little progress with regard to access protocols,
JMAP finally addresses this and many other
issues.
No transport security:
Emails and account passwords couldn’t be secured in transit for more than a decade.
Once Transport Layer Security
(TLS) became popular,
existing protocols were retrofitted so that all communication could be encrypted and authenticated.
All the protocols
were extended to support Explicit TLS,
but all of them require different commands to activate TLS,
which makes it difficult to use some of them
from the command line.
The introduction of protocol variants which use Implicit TLS
required additional port numbers, which confuses
ordinary users even more.
Since mail servers don’t know whether other mail servers support TLS, the communication between
them is still
vulnerable to downgrade attacks.
I’ll cover in the last chapter of this article how such attacks can be prevented.
No sender authentication:
Since emails aren’t authenticated, it’s quite easy to spoof the sender of a message.
This aggravates problems such as
spam and phishing,
and it can lead to undesirable backscatter.
I’ll explain the mechanisms which are used to alleviate this issue in the last
chapter.
Benign inconsistencies
Thunderbird =?UTF-8?B?wqFCdWVub3MgZMOtYXMh?=
Gmail =?UTF-8?B?wqFCdWVub3MgZMOtYXMh?=
Outlook.com =?iso-8859-1?Q?=A1Buenos_d=EDas!?=
Unreasonable decisions
If you want to have something added to or removed from this list, let me know.
Innovation
Besides JMAP, dynamic content,
and what we’ll discuss in the last chapter,
there was barely any innovation over the last two decades.
This is a
pity given that email is the only decentralized communication service with global adoption.
I can only speculate about the reasons for the lack of
innovation:
Complexity: The enormous complexity of email can deter software engineers from entering the field.
Patching a heavily patched system
further is also not appealing to many young talents.
I hope this article can motivate more people to shape the future of email in a positive way.
Fragmentation: The email ecosystem is so fragmented that
no single organization can push the industry forward.
The innovation that we see,
such as email markup and dynamic content,
often remains limited to just a few companies.
If you want to write a mail client for a general
Format innovation
Since Skype failed to innovate,
it was superseded by Zoom and other applications.
WhatsApp might share a similar fate:
Telegram is showing us
how much room for innovation there is for a messaging app.
There’s plenty of features I would like to see in email.
For a start, we still have no No-
Reply header field,
no Proof-Of-Work header field,
no header field to reference the previous message
by its hash
(ideally using a hash tree for
MIME parts
so that attachments can be removed from a message without invalidating its hash),
no header fields for the sender’s contact details
to
replace email signatures,
no content type to initiate and reply to surveys, etc.
Some features, such as message compression, exist in theory but not in practice.
Other features, which originated in the alternative email system
X.400,
were formally specified as IETF email header fields in order to increase compatibility between the two systems
but were never
recommended for general use.
Among these header fields are Supersedes
to replace a sent message with a revised version,
Expires to indicate
when a message loses its validity,
and Reply-By to request a response in the specified time period.
Client innovation
Given the decentralized nature of email, protocol and format innovations are difficult to achieve.
However, nothing hinders mail clients from
innovating at the edge of the network.
I’ve mentioned plenty of ideas throughout this article.
Among them are sender approval, automatic
challenges,
Bcc recovery, privacy features such as
proxying remote content via Tor
(and even submitting emails via Tor as long as mailbox
providers leak the IP addresses of their users),
and security features such as preventing malicious display names
and different appearances of
messages.
It would be great if my mail client displayed whether a received message was successfully
authenticated with SPF and DKIM
(just like
Gmail).
I would like to see native support for DNS-based autoconfiguration,
Sieve and ManageSieve,
as well as PGP.
I don’t understand why mail
clients separate the outbox from the inbox.
(I don’t know any other messaging app which does this,
and just because IMAP uses folders doesn’t
mean you have to display them.)
I think it would be great if my mail client could
timestamp all the emails that I send.
Whenever I submit a
responsible disclosure, I do this manually.
Fixes
The last chapter of this article is dedicated to more recent standards
which address some of the aforementioned security issues.
We’ll study how
spoofing is prevented with domain authentication
and how confidentiality and integrity is ensured in the presence of an
active attacker with strict
transport security.
Many of the approaches rely on the Domain Name System (DNS)
to provide additional information.
This is secure only if the
records are authenticated with
DNSSEC.
I will no longer mention this aspect in the remaining subsections.
Some of the steps have to be
performed by the owner of the domain rather than the mailbox provider.
If you use a custom domain for your emails, you should definitely read
the part about
domain authentication to make sure that your domain is configured properly.
Since email is a decentralized service, we can improve
its security only in a collective effort.
Domain authentication
Historically, the sender of an email was not authenticated.
Anyone could relay a message to anyone
using any From address they wanted.
Impersonating another sender is known as spoofing.
While the prevention of spoofing won’t eliminate spam and phishing on its own
because
spammers can implement the following standards as well and phishing remains possible
with similar domains and malicious display names,
it’s an
important prerequisite for other techniques, such as flagging unknown senders.
As we saw earlier,
email spoofing is addressed in two steps:
The
incoming mail server of the recipient verifies
that the other party is authorized to send emails on behalf of the sender’s domain
and the outgoing
mail server of this domain ensures that the local part of the From address
belongs to the user who submitted the message.
User User
authen- authen-
tication tication
User
authen- Mail client Mail client
tication of sender of recipient
The incoming mail server of the recipient authenticates the outgoing mail server of the sender
and the outgoing mail server of the sender authenticates the user who submits the message.
As the title suggests, this subsection covers only the first part of the problem, namely
how a domain owner can specify which mail servers are
authorized to send messages on behalf of the domain and
how receiving mail servers can verify whether the sending mail server is indeed
authorized for the claimed domain.
The second part is usually solved with
password-based authentication mechanisms.
The following techniques
don’t prevent spoofing if the outgoing mail server of the sender is compromised
or if the attacker can create an account at the same mailbox
provider
and impersonate another user during submission.
Before you continue, make sure that you understand the difference between
a message and its envelope.
Adoption benefits
At first glance, it seems as if configuring your domain with the above standards benefits mostly others.
Why should you invest some of your
valuable time in your email setup just to protect others?
In economics, benefitting unrelated third parties without being compensated for it is
known as a
positive externality.
Treating information security as a public good,
one might expect to see many free riders,
who benefit from
improved security without contributing to it.
Fortunately, this is not what is happening with the above standards
as they benefit the people
who deploy them on their domains as well:
Deliverability: Protecting your domain with SPF, DKIM, and DMARC records
makes it more difficult for others to abuse your domain for
spamming and phishing.
When fewer messages coming from your domain are marked as spam,
the reputation of your domain improves,
which increases the chance that your messages reach their recipients.
Flexibility: By specifying which servers are authorized to deliver email on behalf of your domain,
you make it possible for others to attach
reputation to your domain
rather than to the IP addresses of your outgoing mail servers.
This allows you to change your servers and
deploy additional ones
without losing the reputation that you’ve built so far.
Unfortunately, these benefits apply only to the domains which you use to send emails from.
From a security perspective, however, it’s just as
important to configure SPF and DMARC records on the domains
which you don’t use to send emails from.
Since the above incentives exist
only for the former category of domains but not the latter,
mail server authorization is widely deployed on primary domains but less so on
redirect domains.
Domain owner
Knowing with certainty that a message was sent from a specific domain is important for
algorithms such as reputation systems and email
filters,
but domain authentication can also give human users a false sense of security, which is dangerous.
On the one hand, it can be difficult
for us to tell different domain names apart,
i.e. we easily fall victim to homograph attacks.
On the other hand, it can be really hard to figure
out who owns the domain in question.
There are around 1’500 top-level domains,
and you likely don’t know which top-level domain each of
$ telnet whois.iana.org 43
com
[…]
whois: whois.verisign-grs.com
[…]
Connection closed by foreign host.
$ telnet whois.verisign-grs.com 43
ef1p.com
[…]
Registrar WHOIS Server: whois.gandi.net
[…]
Connection closed by foreign host.
$ telnet whois.gandi.net 43
ef1p.com
[…]
Registrant Name: REDACTED FOR PRIVACY
[…]
Connection closed by foreign host.
If you perform the above steps, you’ll find that almost all information is
redacted for privacy.
This practice has become the norm rather than
the exception, and there is a
proposal to abolish the WHOIS system
as we know it altogether.
And even if you do get a useful answer,
the
provided information might not be trustworthy
since domain registrars don’t verify their customers.
Privacy implications
Includes:
An SPF record can include the IP addresses of another SPF record.
Search for the appropriate record from your mailbox provider.
For
example, put include:_spf.google.com (source)
into your SPF record if you use Google Workspace.
Since mailing list providers, such as
Mailchimp,
use an address of their own in the MAIL FROM command
so that they can handle bounce messages for you,
you don’t need to add the
addresses of their servers to your SPF record.
Default:
Provide an explicit default result for any sender
which didn’t match one of the previous mechanisms.
If you want incoming mail servers
to reject messages with a spoofed MAIL FROM domain, use -all.
If you want incoming mail servers to just flag such messages as potentially
fraudulent, use ~all.
In order not to disrupt email forwarding,
incoming mail servers are unlikely to enforce your SPF policy.
They are much
more likely to enforce the domain policy of your
DMARC record.
An SPF record created according to the above steps looks as follows: v=spf1 include:_spf.google.com -all.
On domains from which you
don’t send any emails,
you should use v=spf1 -all.
The full syntax of SPF records is much more powerful than this but rarely needed.
I will cover
SPF in more detail in the boxes below.
There are a lot of things
that can go wrong when configuring an SPF record.
For a start, a domain may have
at most one SPF record
and the number of additional DNS lookups an SPF record may trigger
is limited.
Instead of listing all the pitfalls here, I’ve
built a tool
which performs 30 different checks on your SPF record.
It uses Google’s DNS API to query the records.
Please note that this tool
warns you only about common mistakes,
it doesn’t verify whether your outgoing mail servers are included in the record.
You still have to test your
setup by sending emails
and checking the Received-SPF header field.
By not evaluating whether an IP address passes SPF validation,
the tool is
also limited in other regards.
Since DNS records can change over time and the SPF check has to succeed only at the point of delivery,
incoming mail servers should record
the result of their SPF evaluation in the header of accepted messages.
This allows mail filters and mail clients to process and display
messages differently
depending on whether the domain of the sender was successfully authenticated.
There are two fields for this purpose:
Received-SPF
and Authentication-Results, which we’ll discuss later.
Both of them are trace fields,
which means that they should be added
at the top of the header
and that they must appear above all fields with the same name.
The format of the Received-SPF header field is as
follows:
An example Received-SPF header field. The values are intended to make the result verifiable.
Protecting subdomains
Incoming mail servers query only the domain of the MAIL FROM address
(or the HELO identity as a fallback) for an SPF record.
If
support.example.com doesn’t have an SPF record,
SPF verifiers won’t continue the lookup with example.com.
Does this mean that you
should configure an SPF record for all your subdomains?
If you use a subdomain to send emails from, then the answer is yes.
If you don’t use
Email forwarding
Historically, emails to alias addresses were forwarded without changing the MAIL FROM address,
which had the advantage that bounce
messages were sent directly back to the original sender.
Nowadays, forwarding emails without changing the MAIL FROM address breaks SPF
validation.
The incoming mail server of the final recipient might reject the message
because the forwarding mail server isn’t authorized to
send messages on behalf of the sender’s domain.
In order to avoid this, forwarding mail servers have to use a
sender rewriting scheme,
which
makes use of variable envelope return paths (VERP).
The idea is to encode the original MAIL FROM address in a new MAIL FROM address
so that
bounce messages coming back to this address can be forwarded to the original sender.
In order not to forward spam to the original sender,
the forwarding mail server should use bounce address tag validation (BATV).
In practice, most mail servers also accept emails with failed SPF
checks
if the reputation of the forwarder is high enough.
As a consequence, forwarding usually works even without rewriting the MAIL FROM
address,
which defeats the purpose of SPF to some degree.
RFC 7208
also discusses some other solutions to the forwarding problem.
SPF qualifiers
+ pass The sender is authorized to send messages on behalf of the given domain.
- fail The sender is not authorized to send messages on behalf of the given domain.
? neutral The domain owner makes no assertion. This result has to be treated like none.
~ softfail Between fail and neutral. The message can be flagged but not rejected.
The four qualifiers and the evaluation results to which they lead.
SPF mechanisms
SPF modifiers
exp
allows the domain owner to specify an explanation for a fail result.
If you include exp=_exp.example.com in your SPF record,
the
SPF verifier looks for a TXT record at _exp.example.com
and returns its data to the sender in the case of a fail result.
This DNS query
doesn’t count towards the lookup limit of 10.
Modifiers should appear at the end of an SPF record but can appear anywhere.
Unknown modifiers have to be ignored so that SPF can be
extended in the future.
You can use any domain name in these modifiers, _spf and _exp have no special meaning.
SPF macros
Instead of using a domain name after the a, mx, include, exists, and ptr mechanisms
or the redirect and exp modifiers, you can use an SPF
macro.
SPF macros are domains, where expressions of the form %{…} are replaced
with information from the SMTP session.
Since the above
tool doesn’t have this information, it cannot evaluate macros.
If you want to use SPF macros, you have to read the RFC.
Let me just give you
one example:
If you are worried about breaking email forwarding when using SPF,
you can specify a neutral result for trusted mail servers
by including
?exists:%{ir}.whitelist.example.org in your SPF record.
i stands for the sender’s IP address and r reverses the value,
splitting on dots by default.
Other use cases of macros are to provide
a different policy for each user or to rate-limit messages coming from
certain IP addresses.
Macros can cause problems with internationalized email addresses
as discussed in RFC 8616
and aggravate privacy
implications by
leaking sender addresses into the DNS.
Such a tracking example can be found at altavista.net.
DNS queries and replies use the User Datagram Protocol (UDP) when possible
and fall back on the Transmission Control Protocol (TCP)
on
the transport layer if necessary.
Since UDP is a connectionless communication protocol,
no additional messages have to be sent back and
forth to set up the connection, which makes it more efficient than
TCP with its handshake.
Historically, UDP messages were limited to 512
bytes.
This limit was raised in 1999 with the introduction
of the Extension Mechanisms for DNS (EDNS).
EDNS allows the sender to indicate
a higher UDP payload size in a so-called
pseudo resource record of the type OPT.
The command-line tool dig displays this OPT record as
follows:
As an ordinary domain administrator, you likely don’t have to worry about these size limits.
If you operate a large email service or run into
problems with certain providers,
it can make sense to split a large SPF record into several smaller ones, though.
v=DKIM1; p=
Non-repudiation
Unlike the sender’s IP address, which can be verified only by the first incoming mail server,
ordinary digital signatures can be verified by
anyone.
In comparison with SPF,
DKIM has the advantage that email forwarding due to
alias addresses no longer breaks domain
authentication.
Unless a message has been modified by the relaying mail server,
the final recipient can still verify the authenticity of the
message.
Another consequence of using digital signatures is that senders can
no longer repudiate their messages.
If a dispute arises from an
oral conversation, it’s one person’s word against another person’s word.
For modern email, this is no longer the case, and many people aren’t
aware of this.
I don’t know whether DKIM signatures would make much of a difference if an email dispute comes before a court.
In the world
of politics, however, the ability to cast or eliminate doubt can make a big difference.
For example, when WikiLeaks released personal emails
of John Podesta,
the chairman of Hillary Clinton’s 2016 presidential campaign,
DKIM signatures proved the authenticity of emails
which
leading Democrats claimed to be fabricated by Russian intelligence agencies.
On the other hand, signing all outgoing emails makes it more
difficult for others
to frame you for things that you’ve never written.
While nothing speaks against deploying SPF and
DMARC,
you should
think twice about introducing DKIM and consult your legal department before doing so.
If you use a large email service such as Gmail,
Outlook.com, or Yahoo Mail,
the provider has made the decision for you.
Since they can be forced to reveal evidence anyway,
you shouldn’t
be using them in the first place if you care about
plausible deniability.
Instead of hosting your emails yourself, you should rather use an
off-
the-record messaging protocol
like Signal,
which implements deniable authentication.
d=gmail.com; s=20161025;
h=mime-version:from:date:message-id:subject:to;
bh=HM9brgzIS3y+e9+sDmAHassF4IPPotkzbGUPemyxbZY=;
b=Lx1FORL1tK47ZvPK+cDr8BRAUp+fZe1RZaxlQ4hp4qfk7jswXVcSxvntyD3VeclxiK
XXV9BblWoFkyth3u0LTavPTr2wOHb3m2IrubMIJsTsttzaV9XNsECLI2kGOSiiOSsj19
+ZHC+Ne7+piXyzhgOGiWYDqqSfN9jmQeKKJuZLTXUOsL/UFNKzUfNdPABpcg8dlZOClR
s/4u++5Zbj8T4fjp1kma+X9q+fKv5oWuYI4BbhQ6ie8g58XRidRowLZTiydocfoRC93x
UT00JSZr4RAOEyDa5ViJTUwdzkNA6AlokRJ6JYAHoAIXtTSIFnymXjVZcBMUTMYOHozu
6SOA==
v required The used version of the DKIM specification. The current version is 1.
a required The signature algorithm. The value is rsa-sha256 or ed25519-sha256 as introduced in RFC 8463.
c optional The canonicalization of the header fields and the body separated by a slash. Either simple or relaxed.
h required A colon-separated list of the names of the header fields which are covered by the signature.
bh required The Base64-encoded hash of the canonicalized message body (see below).
b required The Base64-encoded signature (see the text below this table for how the signed hash is determined).
l optional How many bytes of the canonicalized message body are hashed (see below).
i optional The email address for which the signer takes responsibility (see below).
The various tags of the DKIM-Signature header field. The last four tags are less common.
The IANA registry lists even more tags.
Besides adding additional header fields, some mail servers also modify emails in other ways.
If the verifier doesn’t feed exactly the same
content into the hash function as the signer,
the signature is seen as invalid.
Since DKIM signatures can break due to no fault of the signer,
messages with only invalid signatures should not be treated differently from messages with no signatures at all.
To make DKIM signatures
more robust,
RFC 6376 defines four algorithms to
canonicalize the inputs to the hash function:
simple body canonicalization:
Make sure that the body ends with a single {CR}{LF}.
If the body ends with several {CR}{LF}, convert them
to one.
If there is no {CR}{LF} at the end of the body, insert one.
relaxed body canonicalization:
Delete spaces and tabs before {CR}{LF},
reduce all sequences of whitespace within a line to a single
space,
and apply the simple body canonicalization with the exception that an empty body remains empty.
simple header canonicalization:
Don’t change header fields in any way.
The signer can decide how to canonicalize the header fields and the body.
The format of the c tag is
{HeaderCanonicalization}/{BodyCanonicalization},
and its default value is simple/simple.
The l tag limits how many bytes of the canonicalized body are included in the body hash,
which is stored in the bh tag of the DKIM-Signature
header field.
The idea is that the signer can allow others to extend the current message body
so that mailing lists which add an unsubscribe
footer don’t invalidate the signature.
Unfortunately, there are three problems with this idea:
Applicability: DKIM is not MIME-aware.
Anything you add to a multipart message
after the last boundary delimiter is part of the epilogue
and won’t be shown to the user.
Additionally, many mailing lists also modify the Subject line.
Usability: It’s not clear how partial authenticity can be conveyed to the user without causing confusion.
By design, DKIM protects only the body and the selected header fields of a message
but not its envelope.
If DKIM also covered the RCPT TO
addresses of the envelope,
mail servers could no longer forward emails
without breaking domain authentication.
One of the reasons for
using outgoing mail servers is to allow mailbox providers
to limit the rate at which their users can send emails.
The problem with DKIM is that
any user can get a valid DKIM on a message of their choosing
by sending an email to themself.
Once they have received the email, they can
relay the signed message to
an arbitrary number of recipients.
The recipients might not see their email address in the To or Cc field,
but this is
usually also the case for Bcc recipients.
The mailbox provider who signed the message can stop such a
replay attack
only by revoking the
corresponding signing key.
If the mailbox provider doesn’t use
a different key for each user,
which would increase the load on their name
servers considerably,
revoking the key immediately prevents delayed emails of other users from being delivered as well.
Unless a key is
compromised, it should be revoked only after around one week of no longer being in use.
Even if the mailbox provider did use a different key
for each user,
it would first have to learn that a user abuses their reputation for spamming.
A lot of damage might already have been done
before the key is revoked and the change is propagated through the DNS.
The best that public mailbox providers can do to counter this
attack is to reject outgoing messages
with a high spam score.
You can generate a DKIM signing key yourself with the following commands.
As far as I can tell, you can generate an ED25519
key only with
OpenSSL but not with LibreSSL.
I covered how to install OpenSSL on macOS in an earlier box.
OpenSSL: openssl
Many people and companies use a custom domain without running their mail servers themselves.
Instead, they delegate the delivery and the
receipt of emails to mailbox providers.
SPF makes such a delegation very easy with the include mechanism,
which allows mailbox providers
to change their outgoing mail servers without involving their customers.
If you want to achieve the same with DKIM,
you have to delegate a
DNS zone
in the _domainkey subdomain to your mailbox provider.
The next best thing is to configure a CNAME record
in this subdomain which
points to the DKIM record of the mailbox provider.
Once again,
the standard doesn’t mention whether CNAME records can be used, but this
seems to be
a common practice.
With this approach, the mailbox provider cannot change the selector of their signing key
without involving
their customers, but if you set up several CNAME records,
the mailbox provider can alternate between them.
What makes ATPS unnecessarily complicated is that the DelegateeDomain can be hashed.
The argument for this is to force the subdomain to
a fixed length
so that arbitrarily long third-party domain names can be prepended to the DelegatorDomain
while remaining below 255
characters.
Yet another tag with the name atpsh is added to the DKIM-Signature header field to indicate the used hash function.
In the
previous example, which doesn’t use hashing, the tag is atpsh=none.
If the DelegateeDomain is hashed, which is indicated with atps=sha256,
the hash is encoded with Base32
according to RFC 4648.
Since Base32 uses only the capital letters from A to Z and the numbers from 2 to 7,
the encoding is better suited than Base64 for case-insensitive domain names.
You find an example with hashing in Appendix A of RFC 6541.
The RFC itself is labeled as experimental
and I have no idea whether anyone actually uses ATPS.
Reports: Domain owners can ask receiving mail servers to send them aggregate reports
in regular intervals and failure reports for messages
which failed authentication.
These DMARC reports allow domain owners to monitor their deployment of domain authentication,
to detect
unauthorized sources of legitimate messages, such as webshops
and continuous integration systems,
and to be informed immediately when
their domain is abused for phishing.
Domain policy: Reject Report format: Authentication Failure Reporting Format (AFRF)
Report interval: 24
v=DMARC1; p=reject
Organizational domain
Subdomain policy
Incoming mail servers query the _dmarc subdomain of the domain in the From address first.
If a TXT record starting with v=DMARC1; is found,
the message is handled according to the domain policy found in the mandatory p tag.
If no such record is found, they have to query the
_dmarc subdomain
of the organizational domain next.
If a DMARC record is found, the message is handled according to the subdomain policy
found
in the optional sp tag or according to the domain policy if no subdomain policy is specified.
For example, if there is a TXT record of
v=DMARC1; p=none; sp=reject at _dmarc.example.com,
emails from [email protected] are handled according to the none policy
while
emails from [email protected] are handled according to the reject policy,
assuming that no DMARC record exists at
_dmarc.support.example.com.
If such a record does exist, emails from [email protected] are handled
according to the policy
specified in the p tag of this record.
This is why subdomain policies have an effect only
when specified in the DMARC record of an
organizational domain.
The alternative approach of removing one subdomain after another until a DMARC record is found
was considered
and rejected
because this would allow a malicious sender to trigger dozens of DNS requests on the incoming mail server.
Unix time
Unix time: 1634833691 Now Gregorian time: 2021-10-21 at 16:28:11 Round UTC:
Aggregate reports
<feedback>
<report_metadata>
<org_name>google.com</org_name>
<email>[email protected]</email>
<extra_contact_info>https://support.google.com/a/answer/2466580</extra_contact_info>
<report_id>4027243601387366635</report_id>
<date_range>
<begin>1582588800</begin>
<end>1582675199</end>
</date_range>
</report_metadata>
<policy_published>
<domain>ef1p.com</domain>
<adkim>s</adkim>
<aspf>s</aspf>
<p>none</p>
<sp>reject</sp>
<pct>100</pct>
</policy_published>
<record>
<row>
<source_ip>198.2.140.132</source_ip>
<count>1</count>
<policy_evaluated>
<disposition>none</disposition>
<dkim>pass</dkim>
<spf>fail</spf>
</policy_evaluated>
</row>
<identifiers>
<header_from>ef1p.com</header_from>
</identifiers>
<auth_results>
<dkim>
<domain>gmail.mctxapp.net</domain>
<result>pass</result>
<selector>k1</selector>
</dkim>
<dkim>
<domain>ef1p.com</domain>
<result>pass</result>
<selector>k1</selector>
</dkim>
<spf>
<domain>mail12.mcsignup.com</domain>
<result>pass</result>
</spf>
</auth_results>
</record>
</feedback>
Since having compressed XML files in your inbox isn’t very useful,
you want to use a service which aggregates the aggregate reports for you.
I
use the DMARC analyzer from Postmark for my domains.
Once configured, you get a weekly email with insights into the sources which send
emails on behalf of your domain
and the percentage of emails which passed DMARC authentication.
While the service provider learns
neither the local part of email addresses nor the content of emails,
it learns how many emails were sent to which domains.
This is sensitive
metadata,
which can disclose confidential business relations.
Failure reports
The owner of a domain can ask incoming mail servers to send them a
failure report
for each message that failed one or both authentication
mechanisms.
Unlike aggregate reports, which are delivered periodically,
failure reports are sent right after an authentication failure
to the
addresses in the ruf tag of the sender’s DMARC record.
This allows the domain owner to detect and address delivery problems quickly.
Since
failure reports include either the complete unauthentic message or at least its header,
the domain owner can also analyze phishing attempts
with a spoofed From address.
In order to avert denial-of-service attacks on the domain owner,
incoming mail servers are encouraged
to either
aggregate similar failures over short time periods or
rate limit failure reporting while discarding the remainder.
The Authentication Failure Reporting Format (AFRF), which is specified in RFC 6591,
is currently the only format
for DMARC failure reports.
It is a feedback type
of the Abuse Reporting Format (ARF),
which is specified in RFC 5965.
The format is sometimes also called the Messaging
Abuse Reporting Format (MARF),
which was the name of the IETF working group.
To make failure reports easy to read for machines, they are
structured as follows:
MIME-Version: 1.0
--UniqueBoundary
Content-Type: text/plain
--UniqueBoundary
Content-Type: message/feedback-report
Feedback-Type: auth-failure
Identity-Alignment: [none|spf|dkim]
Original-Mail-From: [email protected]
Source-IP: 192.0.2.1
[…]
--UniqueBoundary
Content-Type: [message/rfc822|text/rfc822-headers]
From: [email protected]
[…]
--UniqueBoundary--
Since DMARC records are publicly visible and DMARC reports can be large and frequent,
you should set up a dedicated mailbox or an alias
address to receive them.
In order to prevent bad actors from flooding a victim’s mailbox with unwanted reports
by listing his or her address
in the rua or ruf tag of their DMARC record,
a receiving domain has to approve each domain for which it is willing to receive DMARC reports
with a special DMARC record
unless they belong to the same organizational domain.
The content of such approval records is v=DMARC1;,
and
they are published as TXT records at {PolicyDomain}._report._dmarc.{ReceivingDomain}.
For example, if the DMARC record of
example.org includes rua=mailto:[email protected],
there has to be an approval record at
example.org._report._dmarc.example.com.
The above tool verifies this for you.
If you run a DMARC report analyzer business and want to
allow anyone to direct reports to you, you can configure
a wildcard record at *._report._dmarc.{YourDomain}.
Since the local part of the
report address needs no approval, you should use a dedicated domain for this.
Otherwise, anyone can fill your personal mailbox with DMARC
reports.
Due to this approval mechanism, report emails don’t have an unsubscribe button.
To avoid processing fraudulent reports, all report
emails must pass DMARC authentication themselves.
{Method}={Result}[ {PropertyType}.{PropertyName}={Value}]*
]*
The Verifier is an identifier for the entity which performed the verification,
and the optional Version indicates which version of the field
format is in use,
where its current and default value is 1.
IANA maintains a long list of
authentication methods and their results.
The Method is
usually spf, dkim, or dmarc, and the Result is usually pass or fail.
The PropertyType is usually smtp or header, depending on whether the
verified property
is from the SMTP envelope or the message header.
For example:
Authentication-Results: mx.google.com;
The Authentication-Results header field added by Google when I send an email to Gmail.
[…] is the same comment as in the
SPF-Received header field.
Gmail doesn’t quite adhere to the standard.
To begin with, it adds the Authentication-Results
below the Received header field.
Since SPF doesn’t authenticate the local part of the MAIL FROM address,
it should not be
included in the smtp.mailfrom property.
Furthermore, I have no idea why Gmail includes [email protected] rather than
header.d=ef1p.com
in the DKIM result.
To be fair, the RFC has one example
using the d tag and
one example using the i tag.
ARC-Message-Signature
is a DKIM signature over the potentially modified message,
which may not cover ARC-related and
Authentication-Results header fields.
The instance tag replaces DKIM’s i tag, and for some reason,
the version tag v is not defined for
ARC-Message-Signature.
d=google.com; s=arc-20160816;
b=MoNLEuRiwzIJ7FoSItrs3mzkBjiRhHfrADb6gVmEVHMyH1blgnpjxHqJNygEfYdVNo
/kMFAxLbM6FPAALqyK6VGsDJQAQpHzGzVx1UQ1URugg28cAo5Kp7gSntyJYYZ1Ni/BCp
czD915SdxwTtJ9rg0ynFUuZXfi8aAjCcZeVXGdTubDwjgs61v1KfxVf6aWMCLUkr9k9B
JDiTrr/gyJXD1nLNPHMzRgeveIEWgqWkE32BRSdJ42i9Nq0PAHaN3k5g3z579Li9UW1N
Oobd/OCvAXD6bEYkmWtUmIIuH4HniCmC9AG7aQ9Ewko35HWwezLP7MvjlCqSYRQHelNa
UeaQ==
h=to:date:message-id:subject:mime-version:content-transfer-encoding
:from:dkim-signature;
bh=gmLzBJCLmp99Kb/Rm3Sh+/9143Y1eDcNI0l8V6LhSLo=;
b=QTDYEklgqj2/0Vt7k6r2HK9Td7TVDrmnLxSs1de0ruFpjbWznIpKLV2iyFbMvo9GcO
qKPRZiO76vn0kbxnGiBp5FYOP4d9LES+yR04Nx0CIiJ2iMJfDCUxMicQrc//ZPmM7njk
iBjNKfxDraSuFq3zh65hlYHH0if41dzLy9cPPVHqSI6luefv8MjMO9tY3/5CBbg4wzIg
8XM3RLD7lssDrzM8fpUgXW/nKSQav7MKzvXmnTRa43FcGvP3Aq6GQWdVl5gt8tyZWS8f
zFH+Rn3gGG4/iGru0HGQKvaKZdrtXx43mvFl7LSv2tA+ubNqts/sV/esGQSk2rO6VIAi
wmKQ==
The ARC header fields that Gmail added to the message from which I took the
Authentication-Results header field in the previous box.
If you think that I missed the point of ARC, which might very well be the case,
please let me know.
# SPF:
$ dig gmail.com txt +short
# DKIM:
$ dig 20161025._domainkey.gmail.com txt +short
Mail clients should rely on these header fields only if they know that their incoming mail server
supports BIMI
and strips these header fields
from incoming messages.
Otherwise, a mail client is vulnerable to malicious header fields included by the sender.
This draft
specifies several requirements for the X.509 certificate
which is used to bind a brand indicator to a domain name:
Certificate encoding:
The certificate must be encoded in the Privacy-Enhanced Mail (PEM)
format as specified in RFC 7468.
The filename
specified in the a tag of the BIMI record should have a .pem extension.
The file has to include the certificates of all intermediate
certification authorities
up to the root certification authority, whose certificate doesn’t have to be included.
Logotype extension:
The brand indicator must be included in the logotype extension for X.509 certificates
as specified in RFC 3709.
The
object identifier (OID) of this field is 1.3.6.1.5.5.7.1.12.
SVG image:
The brand indicator must be a GZIP-compressed SVG image
as specified in the next box.
It has to be Base64-encoded
in a
data URL
as specified by RFC 2397
and RFC 6170.
Key usage:
BIMI introduces a new extended key usage
with an OID of 1.3.6.1.5.5.7.3.31.
This key usage must be listed in the verified
mark certificate
and in the certificate of the issuing certification authority.
Name matching:
The domain of the BIMI record with or without the {Selector}._bimi. subdomain must be
included in the Subject
Alternative Name (SAN) field
of the verified mark certificate.
Certificate validation:
The verified mark certificate must include a
certificate revocation list (CRL)
distribution point as specified in RFC
5280.
The verified mark certificate must also include a
signed certificate timestamp (SCT)
for Certificate Transparency
as specified in RFC
6962.
Transport security
As discussed earlier, ESMTP
uses the STARTTLS extension
to upgrade an insecure TCP connection
to a secure TLS connection:
220 server.example.com
EHLO client.example.org
250-server.example.com
250-PIPELINING
250 STARTTLS
STARTTLS
220 Go ahead
220 server.example.com
220 server.example.com
EHLO client.example.org
EHLO client.example.org
250-server.example.com
250-PIPELINING
250 STARTTLS
250-server.example.com
250 PIPELINING
220 server.example.com
220 server.example.com
EHLO client.example.org
EHLO client.example.org
The attacker can drop the client’s TLS connection until it gives up and connects to the server with TCP.
3. User configuration: Let the user require that their messages may be delivered only with TLS.
REQUIRETLS makes this possible.
Server authentication
REQUIRETLS extension
As a user, you may prefer a delivery failure over an insecure delivery for certain messages.
The REQUIRETLS extension for ESMTP,
which is
specified in RFC 8689 and is not yet widely supported,
allows you to require transport security between all involved mail servers when
sending an email.
For both submission and relay,
the ESMTP server indicates its support for this extension
by listing REQUIRETLS in its
response to the EHLO command.
If your outgoing mail server supports REQUIRETLS,
your mail client can add REQUIRETLS as a parameter to
the MAIL FROM command as follows:
How a client asks an ESMTP server to forward the message only with TLS
to other servers which support REQUIRETLS as well.
TLS-Required: No
DNS authentication:
Domain names are used to reference services,
which are often provided by external service providers.
Since changes are
easier if the service providers can manage their address records themselves,
indirections with MX, SRV, and CNAME records are quite common in
the Domain Name System.
The same is true for security-related DNS records,
such as TLSA records, which are introduced by DANE.
(Officially,
TLSA is not an acronym but simply the name of the record type.
Personally, I like to think of TLSA as Transport Layer Security Anchor.)
Letting
the service providers configure the necessary TLSA records
at their domains has some advantages.
However, the TLSA records can be trusted
only if the DNS records are authenticated with DNSSEC
both in the zone of the customer and in the zone of the service provider.
If the reply to
the MX, SRV, or CNAME query can be spoofed by an attacker,
the attacker can pose as the legitimate service provider to unsuspecting clients.
Downgrade resistance:
By configuring TLSA records at the appropriate subdomain,
a service provider indicates that its server supports TLS.
Thanks to DNSSEC’s authenticated denial of existence,
an attacker cannot suppress the retrieval of the TLSA records,
which makes DANE
resistant to downgrade attacks.
Before you can deploy DANE, you have to deploy DNSSEC.
If a client encounters an unsigned domain,
it
continues with opportunistic encryption.
If a client learns from the superzone that the subzone is signed
but cannot retrieve the signed TLSA
records or a signed statement of their absence,
it aborts the connection.
Trust anchor:
In order to prevent a man-in-the-middle attack,
the client has to authenticate the server.
Instead of relying on the traditional
public-key infrastructure (PKI),
DANE requires service providers to put the public key of their server
or the public key of a trust anchor of their
choosing
into their TLSA records.
DANE clients then verify whether the server’s public key is confirmed
directly or indirectly by one of the
server’s TLSA records.
Relying on DNSSEC rather than on traditional
certification authorities (CAs)
has several advantages.
The tool below queries the MX records of the given domain and the TLSA records of each mail server.
It uses Google’s DNS API for the DNS queries
and performs only rudimentary checks on the format of the TLSA records.
It doesn’t validate whether DNSSEC and DANE are deployed correctly.
If you want to check this, you can use this validator.
I cover how you can generate
and verify TLSA records yourself below.
You can deploy DANE
only if your mailbox provider supports it.
If your mailbox provider has configured TLSA records for their servers,
all that you have to do is to enable
DNSSEC on your custom domain.
PKI comparison
It’s important to note that DANE changes only how TLS clients verify server certificates.
DANE also uses X.509 certificates,
and no
modifications are needed in TLS server software.
1. Certificate usage:
This field captures two separate pieces of information in a single number.
On the one hand, the number indicates
whether the certificate referenced by the TLSA record
belongs to a trust anchor, which certified the public key and the identity of the end
entity,
or to the end entity, i.e. the server to which the client wants to connect.
On the other hand, the number also indicates whether a
valid path
to a PKIX certification authority which is trusted by the TLS client has to exist
or whether the certificate is to be trusted just
because it is referenced in the TLSA record.
These certificate usages apply only to X.509 certificates
in DER encoding.
The four certificate usages with the acronyms as introduced in RFC 7218.
2. Selector:
This field specifies which part of the certificate is referenced by the TLSA record.
The possible values are 0 for the full certificate
and 1 for the
SubjectPublicKeyInfo (SPKI),
which consists of the public-key algorithm and the subject’s public key.
The latter does not
cover the names of the issuer and the subject, the validity period,
or any certification constraints.
3. Matching type:
This field specifies how the selected content is presented in the certificate association field.
0 means that the selected
content is included as is in the last field,
1 means that the certificate association data is the SHA-256 hash
of the selected content, and 2
means that the SHA-512 hash is used instead.
3 1 1 76bb66711da416433ca890a5b2e5a0533c6006478f7d10a4469a947acc8399e1
Name matching
Expiration date
Certificate chain
Certification constraints
Given that DANE-EE uses almost none of the information in the certificate,
RFC 7250 specifies a TLS extension
to use just the raw public key
instead of the complete certificate of the server
during the handshake.
For the same reason, it is recommended to combine a certificate
usage of 3 (DANE-EE) with a selector of 1 (SPKI).
On the other hand, the selector should be 0 to match the full certificate
if the TLSA record
references the certificate of a trust anchor with 2 (DANE-TA)
so that any certification constraints are also covered by the TLSA record.
The
certificate usages PKIX-EE and PKIX-TA provide more security
only if an application layer protocol
forbids the certificate usages DANE-EE
and DANE-TA.
Otherwise, an attacker who compromised the DNS zone can simply change the certificate usage to DANE-EE or DANE-TA
in
order to bypass the additional PKIX verification.
Which combinations of DANE parameters you should and should not use.
The last two rows are applicable only to SMTP for Relay on port 25.
The ProviderDomain is the hostname of the server which actually provides the service.
It is determined by resolving CNAME records to the
canonical domain
and following service-specific records, such as MX and SRV records.
Requiring the TLSA record to be located at the
ProviderDomain has three advantages:
A disadvantage of this approach is that it’s up to your service provider to support DANE,
whereas you can deploy MTA-STS yourself
if your
service provider uses a certificate issued by a widely accepted certification authority.
Another consequence of domain-based transport
security is that DANE and MTA-STS
cannot be used with address literals, such as user@[192.0.2.1].
A server can have several TLSA records, and its certificate has to be authenticated by just one of them.
There are two situations in which you
want to use multiple TLSA records:
1. Key rotation:
Cryptographic keys have to be replaced from time to time.
Since it takes some time
for new records to propagate through
the Domain Name System,
the TLSA record for a new key has to be published at least one validity period before it can be used.
The TLSA
record for the old key/certificate can be removed only once the new key is being used.
2. Algorithm agility:
DANE clients don’t have to support all TLSA parameters and combinations thereof.
A service provider can publish
several TLSA records with different parameters for the same key/certificate,
which allows DANE clients to rely on the strongest TLSA
record which they can use.
For example, DANE clients should but don’t have to support SHA-512.
If SHA-256 is deprecated at some point
in the future,
service providers can publish the same key/certificate in TLSA records
with a matching type of 1 (SHA-256) and 2 (SHA-512).
Clients which support the latter will use the latter record,
all the others will continue to use the former one.
RFC 7671 requires that
each
published combination of TLSA parameters covers the certificate chain in use
so that DANE clients can abort the connection
if the server
cannot be authenticated with one of the usable TLSA records.
Name checks
Secure A/AAAA
– – Non-expanded @ domain
and TLSA records
Client behavior
DANE-authenticated
$ openssl x509 -in certificate.pem -noout -pubkey | openssl pkey -pubin -outform DER | openssl sha256
OpenSSL: openssl
$
openssl s_client -starttls smtp -connect mail.protonmail.ch:25 -verify_return_error -dane_tlsa_domain
"mail.protonmail.ch" -dane_tlsa_rrdata "3 1 1 6111a5698d23c89e09c36ff833c1487edc1b0c841f87c49dae8f7a09e11e979e"
-dane_tlsa_rrdata "3 1 1 76bb66711da416433ca890a5b2e5a0533c6006478f7d10a4469a947acc8399e1"
[…]
---
SSL handshake has read 5166 bytes and written 433 bytes
Verification: OK
Verified peername: *.protonmail.ch
DANE TLSA 3 1 1 ...8f7d10a4469a947acc8399e1 matched EE certificate at depth 0
---
[…]
How to compute the certificate association data directly from the server certificate.
2> /dev/null suppresses the error output.
Configuring TLSA records for your incoming mail servers is only half the battle.
If you run incoming mail servers, you almost certainly also
operate an outgoing mail server.
Having TLSA records for your incoming mail servers allows others to deliver email securely to you,
but you
also want to make sure that your messages are delivered securely to others.
Even if you don’t configure TLSA records for your incoming mail
servers,
you should activate DANE on your outgoing mail server
so that it authenticates the incoming mail servers of your recipients before
delivering your messages.
Both Postfix and
Exim
support DANE.
This guide
shows you how to configure them.
The number of mail servers
with TLSA records is rising steadily.
Here are some statistics for the
.nl top-level domain.
pin-sha256="E9CZ9INDbd+2eRQozYqqbQ2yXLVKB9+xcprMF+44U1g=";
pin-sha256="LPJNul+wow4m6DsqxbninhsWHlwfp0JecwQzYpOLmCQ=";
report-uri="https://other.example.com/pkp-report"
DNS record:
A TXT record is used to inform the sender that the receiving domain has an MTA-STS policy
and whether the policy has been
changed since the last time the sender retrieved it.
Since small DNS records are retrieved with UDP,
this is much faster than retrieving the
policy file,
which requires a TCP
and a TLS handshake.
Policy file:
The sender fetches the MTA-STS policy with HTTPS from the receiving domain.
The MTA-STS policy indicates what the sender shall
do
if it cannot authenticate the incoming mail server of the recipient with the presented PKIX certificate.
Since MTA-STS doesn’t require that
DNS records are authenticated with DNSSEC,
the policy file is also used to authenticate the MX records of the receiving domain.
This allows
clients to match the presented certificate against the name of the mail server.
The tool below queries the MTS-STS record and the policy file of the given domain.
It uses Google’s DNS API for the DNS query
and the email
tracking server,
which I’ve deployed on Heroku,
as a proxy server.
This is necessary because the policy file is usually served without the header
field which is required
for cross-origin resource sharing (CORS).
As you can see in its source code,
my proxy server doesn’t store anything,
but
Heroku logs the last 1’500 requests,
which includes the queried domain and your IP address.
I don’t persist the log file,
but I might check it from
time to time for troubleshooting.
The tool checks the syntax of the DNS record and the policy file,
but it verifies neither the MX records nor
whether the mail server has a valid PKIX certificate.
Comparison to DANE
In my opinion, a successful DANE authentication should take precedence over a failed MTA-STS validation
as this would allow domain
owners to deploy transport security
without the support of their mailbox provider by deploying DNSSEC and MTA-STS.
If the mailbox
provider switches to DANE with self-signed certificates,
clients which support DANE and MTA-STS would continue to deliver emails to this
domain
even if MTA-STS validation now fails.
When delivering an email, an outgoing mail server which supports MTA-STS checks
whether it has a non-expired policy for the domain after
the @ symbol in the recipient’s email address.
If this is not the case, it queries for a TXT record at the _mta-sts subdomain of this domain.
An
MTA-STS record is a semicolon-separated list of key-value pairs encoded in ASCII,
where the keys and values are separated by an equals sign.
MTA-STS records have two required fields:
v=STSv1; id=20190429T010101;
If the MTA-STS record contains an unknown id value for the given domain,
the outgoing mail server retrieves the new MTA-STS policy from
https://mta-sts.{Domain}/.well-known/mta-sts.txt,
where Domain is the domain after the @ symbol in the recipient’s email address.
The Domain is taken as is for both the DNS and the HTTPS lookup
without proceeding with the parent domain
if the queried resource is not
found.
As specified in RFC 8615,
IANA
maintains a registry
for the .well-known directory in order to prevent name collisions among
unrelated standards.
A subdomain is used to allow CNAME indirections.
Moreover, the required DNS record prevents sites which allow
untrusted users to claim subdomains from falling victim
to denial-of-service attacks with malicious policies.
Examples are https://mta-
sts.github.io
and https://mta-sts.blogspot.com.
enforce: The client must abort the connection if it cannot negotiate TLS with a valid server certificate.
It continues with the next
server which matches one of the mx keys
and delays the delivery of the message when all of them fail validation.
Before failing
permanently, the client has to check via DNS and then HTTPS
for an updated policy.
testing: MTA-STS validation failures don’t affect the delivery of messages, but they should be reported
if both the sender and the
recipient support SMTP TLS Reporting (TLSRPT).
version: STSv1
mode: enforce
mx: gmail-smtp-in.l.google.com
mx: *.gmail-smtp-in.l.google.com
max_age: 86400
GET / HTTP/1.0
Host: www.example.com
GET / HTTP/1.0
Host: www.example.com
GET / HTTP/1.0
Host: www.example.com
HTTP/1.0 200 OK
[Headers and body]
HTTP/1.0 200 OK
[Rewritten headers and body]
The problem with HSTS is that it doesn’t protect the first connection to the server:
An active attacker can prevent the browser from learning
the server’s HSTS policy.
For this reason, browsers are delivered with an HSTS preload list.
All domains on this list are accessed only via
HTTPS.
Chromium’s HSTS preload list is 15 MB in size.
All other browsers, for which we know how they source their HSTS preload list, rely
on Chromium’s HSTS preload list.
You can submit your domain for inclusion in this list
at hstspreload.org.
Since it takes months to get a
domain off this list again,
the willingness to be included in this list has to be confirmed with a preload directive
in the Strict-Transport-
The following three security and privacy aspects of HSTS are worth mentioning:
Homograph attacks:
Just like all domain-based approaches, HSTS does not prevent homograph attacks.
If a user enters a wrong address
or visits a fraudulent link,
an attacker can serve malicious content via HTTPS with a valid certificate for the fake domain
or serve the
content via HTTP given that the fake domain never used HSTS.
Web tracking:
Whether a domain uses HSTS is a piece of information which the browser stores
and transmits to the server by accessing it
via HTTPS instead of HTTP.
By using many domains and enabling HSTS for a subset of them which is unique to each user,
users can be
tracked across websites even when
cookies are disabled and
private browsing is enabled.
The endpoints for the report are specified in a TXT record at _smtp._tls.{RecipientDomain}.
The record has two fields:
v: The version of the TLSRPT standard.
This field has to come first, and the only supported value is TLSRPTv1 at the moment.
The fields are separated by a semicolon, and the keys and values are separated by an equals sign.
Here is an example record:
v=TLSRPTv1;rua=mailto:[email protected]
"date-range": {
"start-datetime": "2021-04-09T00:00:00Z",
"end-datetime": "2021-04-09T23:59:59Z"
},
"contact-info": "[email protected]",
"report-id": "2021-04-09T00:00:00Z_ef1p.com",
"policies": [
"policy": {
"policy-type": "no-policy-found",
"policy-domain": "ef1p.com"
},
"summary": {
"total-successful-session-count": 1,
"total-failure-session-count": 0
End-to-end security
Instead of relying on mail servers to perform domain authentication
and enforce transport security,
senders and recipients can take matters into
their own hands and secure their communication themselves.
This idea is often referred to as end-to-end encryption (E2EE).
Since protecting the
authenticity of the content is usually just as important as protecting its confidentiality,
I prefer the term end-to-end security.
As we saw earlier,
arbitrary content can be sent via email.
As long as the sender and the recipients agree on which cryptographic algorithms and which encoding
they want to use,
they can use any technique they want, such as one-time pad encryption
combined with a message authentication code (MAC).
SMTP for
Incoming Outgoing message relay Incoming Outgoing
mail server mail server mail server mail server
of sender of sender of recipient of recipient
IMAP for
message Mail client Mail client
storage of sender of recipient
If the mail client of the sender encrypts and authenticates the message for the mail client of the
recipient,
none of the mail servers have to be trusted (beyond delivering or storing the message).
Secure/Multipurpose Internet
Public-key distribution
Attached to the message
Public key server,
Public-key revocation
Certificate Revocation Lists (CRL) or
Key revocation signature
Costs for users You have to pay for the certificate, but
None
there are free offers for personal use
Modes of operation
Message
Private key of sender
Sign
Encrypt
Public key of recipient
Signed and
Attacker encrypted
message
Private key of recipient
Decrypt
Verify
Public key of sender
Message, status
Deniable authentication
Digital signatures
entail non-repudiation.
As discussed earlier, this is not always desirable.
There are three different properties:
Message
authentication
Digital
Integrity
Authenticity
Non-repudiation
Both S/MIME
and PGP
support compression.
A corresponding content type to use compression alone
in a message exists only for S/MIME,
though.
GNU Privacy Guard (GnuPG or GPG),
which is the most popular open-source implementation of the OpenPGP standard,
compresses
by default before encryption
unless it detects that the data is already compressed.
If you use GPG to encrypt computer-generated text such
as reports or notifications,
you should disable compression
with -z 0
in order not to leak how similar the dynamic part of your message is to
the static part.
Encryption does not conceal the length of the input,
and compression produces a shorter output when more parts of the
message are the same.
If an attacker can influence the dynamic part of the message,
for example by triggering the generated notification,
your system is vulnerable to a so-called compression-oracle attack,
which is an adaptive chosen-plaintext attack.
When used to sign and encrypt emails, S/MIME and PGP are applied to MIME body parts.
Any part in the tree of a multipart message can be
signed or encrypted,
but S/MIME and PGP are usually applied to the whole body of a message.
The output is itself a MIME part with a
content type of
application/pkcs7-mime,
multipart/signed, or
multipart/encrypted.
Given that the output is just another MIME part,
the operations can be applied
in any order.
Therefore, a message can be first encrypted and then signed,
but as we discussed, this is not
recommended.
The flexibility of multipart nesting leads to a lot of complexity.
Since complexity is detrimental to security,
this is not ideal for
end-to-end security.
Security researchers published several attacks on S/MIME and PGP in 2018.
Among other things, they showed that
several mail clients, including Apple Mail and Thunderbird,
allowed HTML content to span several MIME parts.
An attacker could simply
MIME-Version: 1.0
Content-Transfer-Encoding: base64
MIME-Version: 1.0
protocol="application/pkcs7-signature";
boundary="UniqueBoundary"
--UniqueBoundary
--UniqueBoundary
Content-Transfer-Encoding: base64
--UniqueBoundary--
↗
S/MIME signature using the multipart/signed format.
micalg stands for message integrity check (MIC) algorithm.
The advantage of this format is that users can read the message even if their mail client doesn’t support S/MIME.
MIME-Version: 1.0
Content-Transfer-Encoding: base64
MIME-Version: 1.0
protocol="application/pgp-signature";
boundary="UniqueBoundary"
--UniqueBoundary
--UniqueBoundary
--UniqueBoundary--
MIME-Version: 1.0
Content-Type: multipart/encrypted;
protocol="application/pgp-encrypted";
boundary="UniqueBoundary"
--UniqueBoundary
Content-Type: application/pgp-encrypted
Version: 1
--UniqueBoundary
--UniqueBoundary--
↗
PGP encryption with metadata in the first part.
If the plaintext has been signed,
you
get the format of the previous example without MIME-Version: 1.0 after decryption.
RFC 7508 specifies how a sending mail client can include header fields
in the signed attributes of its S/MIME signature.
This draft specifies how
header fields can be repeated in the signed MIME part with a content-type parameter of protected-
headers="v1".
If the message is encrypted, the original subject should be replaced
with three periods.
Subject: ...
Message-ID: <[email protected]>
MIME-Version: 1.0
protocol="application/pgp-signature";
boundary="UniqueBoundary"
--UniqueBoundary
Message-ID: <[email protected]>
Content-Type: application/pgp-signature
--UniqueBoundary--
↗
PGP signature with protected header fields.
The original subject is replaced with three dots only when the message is encrypted.
A challenge when using end-to-end security is to get the public key of the recipient
and to verify the public key of the sender.
RFC 8162
specifies an experimental standard
to anchor S/MIME certificates in the DNS just like DANE.
For this purpose, it defines the SMIMEA record
type,
which has the same format and semantics as the TLSA record type.
All four certificate usages can be used,
and unlike DANE, publishing
the full certificate in the SMIMEA record is not discouraged.
As a sender, you need the unhashed public key of the recipient in order to encrypt
the message for them.
As a recipient, it’s enough if you can verify the public key of the sender with its hash in the SMIMEA record.
When it comes to privacy, publishing information about individual users in the DNS is not ideal.
Unless a client uses DNS over TLS,
queries
for SMIMEA records can be monitored by anyone in the user’s network.
Additionally, SMIMEA records have to be authenticated
with DNSSEC.
If a domain uses NSEC instead of NSEC3 records for authenticated denial of existence,
anyone can walk the zone
to determine the hash of all
local parts with a SMIMEA record.
Since the local part is hashed without the domain part of the email address,
short and common local parts
can be found
with a single rainbow table across domains.
The following tool queries the SMIMEA record of an email address with
Google’s DNS API.
Besides Unicode normalization, the local part is
hashed as is.
You can test the tool with the email address [email protected],
which I found here.
RFC 7929 specifies how users can publish their OpenPGP keys in the DNS
with the new OPENPGPKEY record type.
OPENPGPKEY resource
records are transmitted in binary
but presented in Base64.
Even though OPENPGPKEY records must be authenticated
with DNSSEC,
retrieved
keys should still be verified by the user out-of-band.
The location of OPENPGPKEY records is determined
in the same way as it is done for
SMIMEA records
with the exception that the subdomain is _openpgpkey instead of _smimecert.
Indirections with CNAME records are allowed
as long as the original email address
is listed as one of the user identities in the found OpenPGP key.
A user identity can have the wildcard
character * as its local part
so that a single key can be used
for all addresses of the given domain.
Besides obtaining the key for an email
address, OPENPGPKEY
records allow others to detect when a key has been revoked.
The following tool queries the OPENPGPKEY record of an email address with
Google’s DNS API.
You can test the tool with the email address
[email protected].
The key’s fingerprint
is 7044 3D35 13F7 9AD9 2527 667F 6B14 3BF1 470C 9367.
You can use this key if you want to
report security-related issues to me.
The tool displays the key in ASCII armor.
The first two lines and the last two lines are not part of the
OPENPGPKEY record.
(The second last line is a 24-bit checksum.)
# Export the public key of the given user in the generic record syntax of RFC 3597:
$ gpg --export --export-options export-minimal,no-export-attributes,export-dane [email protected]
You can configure GnuPG to look for OPENPGPKEY records automatically with the
--auto-key-locate configuration option.
After having seen how public keys can be published in the DNS for TLS,
S/MIME, and OpenPGP,
I just want to mention that this is also
supported for the
Secure Shell Protocol (SSH),
which has nothing to do with email.
RFC 4255 specifies the
SSHFP record type,
which consists
of the following three fields:
Algorithm number:
The algorithm of the SSH key.
IANA maintains the registry of assigned values.
Fingerprint type:
The hash function used to determine the key’s fingerprint.
IANA maintains the registry of assigned values.
Fingerprint:
The fingerprint of the SSH key.
The fingerprint is transmitted in binary and presented with hexadecimal digits.
Host *
VerifyHostKeyDNS ask
If no such folder or file exists in your user directory, you can create them with:
$ mkdir -p ~/.ssh
$ chmod 700 ~/.ssh
$ touch ~/.ssh/config
$ chmod 600 ~/.ssh/config
The copyright of this article and its graphics belong to Kaspar Etter.
You can share this article in any form as long as you give proper attribution.