0% found this document useful (0 votes)
91 views14 pages

Multipurpose Internet Mail Extensions

MIME (Multipurpose Internet Mail Extensions) is an Internet standard that extends the format of email to support text in different character sets beyond ASCII, non-text attachments, message bodies with multiple parts, and header information in non-ASCII character sets. MIME allows encoding of non-text content and defines mechanisms for encoding binary data and non-English text for transmission via email.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views14 pages

Multipurpose Internet Mail Extensions

MIME (Multipurpose Internet Mail Extensions) is an Internet standard that extends the format of email to support text in different character sets beyond ASCII, non-text attachments, message bodies with multiple parts, and header information in non-ASCII character sets. MIME allows encoding of non-text content and defines mechanisms for encoding binary data and non-English text for transmission via email.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

MIME

(Multipurpose Internet Mail Extensions)

This article is about the email content type system. For the World Wide Web content type system, see
MIME type. For mime as an art form, see Mime artist. For members of the British engineering society,
see Institution of Mechanical Engineers.

The Internet Protocol Suite

Application Layer

BGP · DHCP · DNS · FTP · HTTP · IMAP · IRC · LDAP · MGCP · NNTP · NTP · POP · RIP · RPC · RTP · SIP · SMTP
· SNMP · SSH · Telnet · TLS/SSL · XMPP ·

Transport Layer

TCP · UDP · DCCP · SCTP · RSVP · ECN ·

Internet Layer

IP (IPv4, IPv6) · ICMP · ICMPv6 · IGMP · IPsec ·

Link Layer

ARP/InARP · NDP · OSPF · Tunnels (L2TP) · PPP · Media Access Control (Ethernet, DSL, ISDN, FDDI) ·

This box: view • talk • edit


Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of e-mail
to support:

* Text in character sets other than ASCII

* Non-text attachments

* Message bodies with multiple parts

* Header information in non-ASCII character sets

MIME's use, however, has grown beyond describing the content of e-mail to describing content type in
general, including for the web (see Internet media type).

Virtually all human-written Internet e-mail and a fairly large proportion of automated e-mail is
transmitted via SMTP in MIME format. Internet e-mail is so closely associated with the SMTP and MIME
standards that it is sometimes called SMTP/MIME e-mail.[1]

The content types defined by MIME standards are also of importance outside of e-mail, such as in
communication protocols like HTTP for the World Wide Web. HTTP requires that data be transmitted in
the context of e-mail-like messages, although the data most often is not actually e-mail.

MIME is specified in six linked RFC memoranda: RFC 2045, RFC 2046, RFC 2047, RFC 4288, RFC 4289 and
RFC 2049, which together define the specifications.

Introduction

The basic Internet e-mail transmission protocol, SMTP, supports only 7-bit ASCII characters (see also
8BITMIME). This effectively limits Internet e-mail to messages which, when transmitted, include only the
characters sufficient for writing a small number of languages, primarily English. Other languages based
on the Latin alphabet typically include diacritics not supported in 7-bit ASCII, meaning text in these
languages cannot be correctly represented in basic e-mail.

MIME defines mechanisms for sending other kinds of information in e-mail. These include text in
languages other than English using character encodings other than ASCII, and 8-bit binary content such
as files containing images, sounds, movies, and computer programs. MIME is also a fundamental
component of communication protocols such as HTTP, which requires that data be transmitted in the
context of e-mail-like messages even though the data might not (and usually doesn't) actually have
anything to do with e-mail. Mapping messages into and out of MIME format is typically done
automatically by an e-mail client or by mail servers when sending or receiving Internet (SMTP/MIME) e-
mail.

The basic format of Internet e-mail is defined in RFC 5322, which is an updated version of RFC 2822 and
RFC 822. These standards specify the familiar formats for text e-mail headers and body and rules
pertaining to commonly used header fields such as "To:", "Subject:", "From:", and "Date:". MIME defines
a collection of e-mail headers for specifying additional attributes of a message including content type,
and defines a set of transfer encodings which can be used to represent 8-bit binary data using characters
from the 7-bit ASCII character set. MIME also specifies rules for encoding non-ASCII characters in e-mail
message headers, such as "Subject:", allowing these header fields to contain non-English characters.

MIME is extensible. Its definition includes a method to register new content types and other MIME
attribute values.

The goals of the MIME definition included requiring no changes to existent e-mail servers and allowing
plain text e-mail to function in both directions with existing clients. These goals were achieved by using
additional RFC 822-style headers for all MIME message attributes and by making the MIME headers
optional with default values ensuring a non-MIME message is interpreted correctly by a MIME-capable
client. A simple MIME text message is therefore likely to be interpreted correctly by a non-MIME client
although if it has e-mail headers the non-MIME client won't know how to interpret. Similarly, if the
quoted printable transfer encoding (see below) is used, the ASCII part of the message will be intelligible
to users with non-MIME clients...

MIME headers
MIME-Version

The presence of this header indicates the message is MIME-formatted. The value is typically "1.0" so this
header appears as

MIME-Version: 1.0

It should be noted that implementers have attempted to change the version number in the past and the
change had unforeseen results.[citation needed] It was decided at an IETF meeting[2] to leave the
version number as is even though there have been many updates and versions of MIME.

Content-ID[3]

The Content-ID header is primarily of use in multi-part messages (as discussed below); a Content-ID is a
unique identifier for a message part, allowing it to be referred to (e.g., in IMG tags of an HTML message
allowing the inline display of attached images). The content ID is contained within angle brackets in the
Content-ID header. Here is an example:

Content-ID: <[email protected]>

The standards don't really have a lot to say about exactly what is in a Content-ID; they're only supposed
to be globally and permanently unique (meaning that no two are the same, even when generated by
different people in different times and places). To achieve this, some conventions have been adopted;
one of them is to include an at sign (@), with the hostname of the computer which created the content
ID to the right of it. This ensures the content ID is different from any created by other computers (well,
at least it is when the originating computer has a unique Internet hostname; if, as sometimes happens,
an anonymous machine inserts something generic like localhost, uniqueness is no longer guaranteed).
Then, the part to the left of the at sign is designed to be unique within that machine; a good way to do
this is to append several constantly-changing strings that programs have access to. In this case, four
different numbers were inserted, with dots between them: the rightmost one is a timestamp of the
number of seconds since January 1, 1970, known as the Unix epoch; to the left of it is the process ID of
the program that generated the message (on servers running Unix or Linux, each process has a number
which is unique among the processes in progress at any moment, though they do repeat over time); to
the left of that is a count of the number of messages generated so far by the current process; and the
leftmost number is the number of parts in the current message that have been generated so far. Put
together, these guarantee that the content ID will never repeat; even if multiple messages are
generated within the same second, they either have different process IDs or a different count of
messages generated by the same process.

That's just an example of how a unique content ID can be generated; different programs do it
differently. It's only necessary that they remain unique, a requirement that is necessary to ensure that,
even if a bunch of different messages are joined together as part of a bigger multi-part message (as
happens when a message is forwarded as an attachment, or assembled into a MIME-format digest), you
won't have two parts with the same content ID, which would be likely to confuse mail programs greatly.

There's a similar header called Message-ID which assigns a unique identifier to the message as a whole;
this is not actually part of the MIME standards, since it can be used on non-MIME as well as MIME
messages. If the originating mail program doesn't add a message ID, a server handling the message later
on probably will, since a number of programs (both clients and servers) want every message to have one
to keep track of them. Some headers discussed in the Other Headers article make use of message IDs.

When referenced in the form of a Web URI, content IDs and message IDs are placed within the URI
schemes cid and mid respectively, without the angle brackets:

cid:[email protected]

Content-Type

This header indicates the Internet media type of the message content, consisting of a type and subtype,
for example

Content-Type: text/plain
Through the use of the multipart type, MIME allows messages to have parts arranged in a tree structure
where the leaf nodes are any non-multipart content type and the non-leaf nodes are any of a variety of
multipart types. This mechanism supports:

* simple text messages using text/plain (the default value for "Content-Type: ")

* text plus attachments (multipart/mixed with a text/plain part and other non-text parts). A MIME
message including an attached file generally indicates the file's original name with the "Content-
disposition:" header, so the type of file is indicated both by the MIME content-type and the (usually OS-
specific) filename extension

* reply with original attached (multipart/mixed with a text/plain part and the original message as a
message/rfc822 part)

* alternative content, such as a message sent in both plain text and another format such as HTML
(multipart/alternative with the same content in text/plain and text/html forms)

* image, audio, video and application (for example, image/jpg, audio/mp3, video/mp4, and
application/msword and so on)

* many other message constructs

Content-Disposition

The original MIME specifications only described the structure of mail messages. They did not address
the issue of presentation styles. The content-disposition header field was added in RFC 2183 to specify
the presentation style. A MIME part can have:

* an inline content-disposition, which means that it should be automatically displayed when the
message is displayed, or

* an attachment content-disposition, in which case it is not displayed automatically and requires some
form of action from the user to open it.

In addition to the presentation style, the content-disposition header also provides fields for specifying
the name of the file, the creation date and modification date, which can be used by the reader's mail
user agent to store the attachment.
The following example is taken from RFC 2183, where the header is defined

Content-Disposition: attachment; filename=genome.jpeg;

modification-date="Wed, 12 Feb 1997 16:29:51 -0500";

The filename may be encoded as defined by RFC 2231.

As of 2010, a good majority of mail user agents do not follow this prescription fully. The widely used
Mozilla Thunderbird mail client makes its own decisions about which MIME parts should be
automatically displayed, ignoring the content-disposition headers in the messages. It also sends out
newly composed messages with inline content-disposition for all MIME parts. Most users are unaware
of how to set the content-disposition to attachment.[4] Many mail user agents also send messages
where they put the file name in the name parameter of the content-type header instead of the filename
parameter of the content-disposition header. This practice is discouraged.[5]

Content-Transfer-Encoding

In June 1992, MIME (RFC 1341, since made obsolete by RFC 2045) defined a set of methods for
representing binary data in ASCII text format. The content-transfer-encoding: MIME header has 2-sided
significance:

1. It indicates whether or not a binary-to-text encoding scheme has been used on top of the original
encoding as specified within the Content-Type header, and

2. If such a binary-to-text encoding method has been used it states which one.

The RFC and the IANA's list of transfer encodings define the values shown below, which are not case
sensitive. Note that '7bit', '8bit', and 'binary' mean that no binary-to-text encoding on top of the original
encoding was used. In these cases, the header is actually redundant for the e-mail client to decode the
message body, but it may still be useful as an indicator of what type of object is being sent. Values
'quoted-printable' and 'base64' tell the e-mail client that a binary-to-text encoding scheme was used
and that appropriate initial decoding is necessary before the message can be read with its original
encoding (e.g. UTF-8).

* Suitable for use with normal SMTP:

o 7bit – up to 998 octets per line of the code range 1..127 with CR and LF (codes 13 and 10
respectively) only allowed to appear as part of a CRLF line ending. This is the default value.

o quoted-printable – used to encode arbitrary octet sequences into a form that satisfies the rules
of 7bit. Designed to be efficient and mostly human readable when used for text data consisting primarily
of US-ASCII characters but also containing a small proportion of bytes with values outside that range.

o base64 – used to encode arbitrary octet sequences into a form that satisfies the rules of 7bit.
Designed to be efficient for non-text 8 bit data. Sometimes used for text data that frequently uses non-
US-ASCII characters.

* Suitable for use with SMTP servers that support the 8BITMIME SMTP extension:

o 8bit – up to 998 octets per line with CR and LF (codes 13 and 10 respectively) only allowed to
appear as part of a CRLF line ending.

* Suitable only for use with SMTP servers that support the BINARYMIME SMTP extension (RFC 3030):

o binary – any sequence of octets.

There is no encoding defined which is explicitly designed for sending arbitrary binary data through SMTP
transports with the 8BITMIME extension. Thus base64 or quoted-printable (with their associated
inefficiency) must sometimes still be used. This restriction does not apply to other uses of MIME such as
Web Services with MIME attachments or MTOM

Encoded-Word

Since RFC 2822, message header names and values are always ASCII characters; values that contain non-
ASCII data must use the MIME encoded-word syntax (RFC 2047) instead of a literal string. This syntax
uses a string of ASCII characters indicating both the original character encoding (the "charset") and the
content-transfer-encoding used to map the bytes of the charset into ASCII characters.

The form is: "=?charset?encoding?encoded text?=".


* charset may be any character set registered with IANA. Typically it would be the same charset as the
message body.

* encoding can be either "Q" denoting Q-encoding that is similar to the quoted-printable encoding, or
"B" denoting base64 encoding.

* encoded text is the Q-encoded or base64-encoded text.

Difference between Q-encoding and quoted-printable

The ASCII codes for the question mark (?) and equals sign may not be represented directly as they are
used to delimit the encoded-word. The ASCII code for space may not be represented directly because it
could cause older parsers to split up the encoded word undesirably. To make the encoding smaller and
easier to read the underscore is used to represent the ASCII code for space creating the side effect that
underscore cannot be represented directly. Use of encoded words in certain parts of headers imposes
further restrictions on which characters may be represented directly.

For example,

Subject: =?iso-8859-1?Q?=A1Hola,_se=F1or!?=

is interpreted as "Subject: ¡Hola, señor!".

The encoded-word format is not used for the names of the headers (for example Subject). These header
names are always in English in the raw message. When viewing a message with a non-English e-mail
client, the header names are usually translated by the client.

Multipart messages

A MIME multipart message contains a boundary in the "Content-Type: " header; this boundary, which
must not occur in any of the parts, is placed between the parts, and at the beginning and end of the
body of the message, as follows:
MIME-Version: 1.0

Content-Type: multipart/mixed; boundary="frontier"

This is a message with multiple parts in MIME format.

--frontier

Content-Type: text/plain

This is the body of the message.

--frontier

Content-Type: application/octet-stream

Content-Transfer-Encoding: base64

PGh0bWw+CiAgPGhlYWQ+CiAgPC9oZWFkPgogIDxib2R5PgogICAgPHA+VGhpcyBpcyB0aGUg

Ym9keSBvZiB0aGUgbWVzc2FnZS48L3A+CiAgPC9ib2R5Pgo8L2h0bWw+Cg==

--frontier--

Each part consists of its own content header (zero or more Content- header fields) and a body. Multipart
content can be nested. The content-transfer-encoding of a multipart type must always be "7bit", "8bit"
or "binary" to avoid the complications that would be posed by multiple levels of decoding. The multipart
block as a whole does not have a charset; non-ASCII characters in the part headers are handled by the
Encoded-Word system, and the part bodies can have charsets specified if appropriate for their content-
type.

Notes:

* Before the first boundary is an area that is ignored by MIME-compliant clients. This area is generally
used to put a message to users of old non-MIME clients.
* It is up to the sending mail client to choose a boundary string that doesn't clash with the body text.
Typically this is done by inserting a long random string.

* The last boundary must have two hyphens at the end.

Multipart subtypes

The MIME standard defines various multipart-message subtypes, which specify the nature of the
message parts and their relationship to one another. The subtype is specified in the "Content-Type"
header of the overall message. For example, a multipart MIME message using the digest subtype would
have its Content-Type set as "multipart/digest".

The RFC initially defined 4 subtypes: mixed, digest, alternative and parallel. A minimally compliant
application must support mixed and digest; other subtypes are optional. Applications must treat
unrecognised subtypes as "multipart/mixed". Additional subtypes, such as signed and form-data, have
since been separately defined in other RFCs.

The following is a list of the most commonly used subtypes; it is not intended to be a comprehensive list.

Mixed

Multipart/mixed is used for sending files with different "Content-Type" headers inline (or as
attachments). If sending pictures or other easily readable files, most mail clients will display them inline
(unless otherwise specified with the "Content-disposition" header). Otherwise it will offer them as
attachments. The default content-type for each part is "text/plain".

Defined in RFC 2046, Section 5.1.3

Message

A message/rfc822 part contains an e-mail message, including any headers. Rfc822 is a misnomer, since
the message may be a full MIME message. This is used for digests as well as for E-mail forwarding.
Defined in RFC 2046.

Digest

Multipart/digest is a simple way to send multiple text messages. The default content-type for each part
is "message/rfc822".

Defined in RFC 2046, Section 5.1.5

Alternative

The multipart/alternative subtype indicates that each part is an "alternative" version of the same (or
similar) content, each in a different format denoted by its "Content-Type" header. The formats are
ordered by how faithful they are to the original, with the least faithful first and the most faithful last.
Systems can then choose the "best" representation they are capable of processing; in general, this will
be the last part that the system can understand, although other factors may affect this.

Since a client is unlikely to want to send a version that is less faithful than the plain text version, this
structure places the plain text version (if present) first. This makes life easier for users of clients that do
not understand multipart messages.

Most commonly, multipart/alternative is used for e-mail with two parts, one plain text (text/plain) and
one HTML (text/html). The plain text part provides backwards compatibility while the HTML part allows
use of formatting and hyperlinks. Most e-mail clients offer a user option to prefer plain text over HTML;
this is an example of how local factors may affect how an application chooses which "best" part of the
message to display.

While it is intended that each part of the message represent the same content, the standard does not
require this to be enforced in any way. At one time, anti-spam filters would only examine the text/plain
part of a message,[citation needed] because it is easier to parse than the text/html part. But spammers
eventually took advantage of this, creating messages with an innocuous-looking text/plain part and
advertising in the text/html part. Anti-spam software eventually caught up on this trick, penalizing
messages with very different text in a multipart/alternative message.[citation needed]
Defined in RFC 2046, Section 5.1.4

Related

A multipart/related is used to indicate that message parts should not be considered individually but
rather as parts of an aggregate whole. The message consists of a root part (by default, the first) which
reference other parts inline, which may in turn reference other parts. Message parts are commonly
referenced by the "Content-ID" part header. The syntax of a reference is unspecified and is instead
dictated by the encoding or protocol used in the part.

One common usage of this subtype is to send a web page complete with images in a single message. The
root part would contain the HTML document, and use image tags to reference images stored in the
latter parts.

Defined in RFC 2387

Report

Multipart/report is a message type that contains data formatted for a mail server to read. It is split
between a text/plain (or some other content/type easily readable) and a message/delivery-status, which
contains the data formatted for the mail server to read.

Defined in RFC 3462

Signed

A multipart/signed message is used to attach a digital signature to a message. It has two parts, a body
part and a signature part. The whole of the body part, including mime headers, is used to create the
signature part. Many signature types are possible, like application/pgp-signature (RFC 3156) and
application/x-pkcs7-signature (S/MIME).

Defined in RFC 1847, Section 2.1

Encrypted
A multipart/encrypted message has two parts. The first part has control information that is needed to
decrypt the application/octet-stream second part. Similar to signed messages, there are different
implementations which are identified by their separate content types for the control part. The most
common types are "application/pgp-encrypted" (RFC 3156) and "application/pkcs7-mime" (S/MIME).

Defined in RFC 1847, Section 2.2

Form Data

As its name implies, multipart/form-data is used to express values submitted through a form. Originally
defined as part of HTML 4.0, it is most commonly used for submitting files via HTTP.

Defined in RFC 2388

Mixed-Replace (Experimental)

The content type multipart/x-mixed-replace was developed as part of a technology to emulate server
push and streaming over HTTP.

All parts of a mixed-replace message have the same semantic meaning. However, each part invalidates -
"replaces" - the previous parts as soon as it is received completely. Clients should process the individual
parts as soon as they arrive and should not wait for the whole message to finish.

Originally developed by Netscape,[6] it is still supported by Mozilla, Firefox, Chrome,[7] Safari (but not in
Safari on the iPhone[citation needed]) and Opera, but traditionally ignored by Microsoft. It is commonly
used in IP cameras as the MIME type for MJPEG streams.[8]

Byteranges

The multipart/byteranges is used to represent noncontiguous byte ranges of a single message. It is used
by HTTP when a server returns multiple byte ranges and is defined in RFC 2616

You might also like