UTF

UTF may refer to:

  • Several Unicode Transformation Formats
  • UTF-1
  • UTF-7
  • UTF-8
  • UTF-16
  • UTF-32
  • U.T.F. (Undead Task Force), an American comic book title
  • Underground Test Facility, used for testing and developing enhanced oil recovery technology in northern Canada
  • UTF-9 and UTF-18

    UTF-9 and UTF-18 (9- and 18-bit Unicode Transformation Format, respectively) were two April Fools' Day RFC joke specifications for encoding Unicode on systems where the nonet (nine bit group) is a better fit for the native word size than the octet, such as the 36-bit PDP-10 and the UNIVAC 1100/2200 series. Both encodings were specified in RFC 4042, written by Mark Crispin (inventor of IMAP) and released on April 1, 2005. The encodings suffer from a number of flaws and it is confirmed by their author that they were intended as a joke.

    However, unlike some of the "specifications" given in other April 1 RFCs, they are actually technically possible to implement, and have in fact been implemented in PDP-10 assembly language. They are however not endorsed by the Unicode Consortium.

    Technical details

    Like the 8-bit code commonly called variable-length quantity, UTF-9 uses a system of putting an octet in the low 8 bits of each nonet and using the high bit to indicate continuation. This means that ASCII and Latin 1 characters take one nonet each, the rest of the BMP characters take two nonets each and non-BMP code points take three. Code points that require multiple nonets are stored starting with the most significant non-zero nonet.

    UTF-7

    UTF-7 (7-bit Unicode Transformation Format) is a variable-length character encoding that was proposed for representing Unicode text using a stream of ASCII characters. It was originally intended to provide a means of encoding Unicode text for use in Internet E-mail messages that was more efficient than the combination of UTF-8 with quoted-printable.

    Motivation

    MIME, the modern standard of E-mail format, forbids encoding of headers using byte values above the ASCII range. Although MIME allows encoding the message body in various character sets (broader than ASCII), the underlying transmission infrastructure (SMTP, the main E-mail transfer standard) is still not guaranteed to be 8-bit clean. Therefore, a non-trivial content transfer encoding has to be applied in case of doubt. Unfortunately base64 has a disadvantage of making even US-ASCII characters unreadable in non-MIME clients. On the other hand, UTF-8 combined with quoted-printable produces a very size-inefficient format requiring 69 bytes for non-ASCII characters from the BMP and 12 bytes for characters outside the BMP.

    Podcasts:

    PLAYLIST TIME:
    ×