ASCII Docx1
ASCII Docx1
Pronounced ask-ee, ASCII is the acronym for the American Standard Code for Information Interchange. It is a code for representing
128 English characters as numbers, with each letter assigned a number from 0 to 127. For example, the ASCII code for uppercase M is
77. Most computers use ASCII codes to represent text, which makes it possible to transfer data from one computer to another.
Recommended Reading: Webopedia's ASCII table page.
Text files stored in ASCII format are sometimes called ASCII files. Text editors and word processors are usually capable of storing
data in ASCII format, although ASCII format is not always the default storage format. Most data files, particularly if they contain
numeric data, are not stored in ASCII format. Executable programs are never stored in ASCII format.
The standard ASCII Character Set
The standard ASCII character set uses just 7 bits for each character. There are several larger character sets that use 8 bits, which gives
them 128 additional characters. The extra characters are used to represent non-English characters, graphics symbols, and mathematical
symbols.
Several companies and organizations have proposed extensions for these 128 characters. The DOS operating system uses a superset of
ASCII called extended ASCII or high ASCII. A more universal standard is the ISO Latin 1 set of characters, which is used by many
operating systems, as well as Web browsers.
Another set of codes that is used on large IBM computers is EBCDIC.
The ASCII code predates the Internet and has been around since the days of teletypes and mechanical printers. ASCII decimal
numbers from 0 to 31 represent control codes that are not used that much these days. However if you are playing with
communications protocols you will see these control codes in use. The ASCII Control Codes table explains what each of these control
codes are.
When is ASCII code used
When a computer sends data the keys you press or the text you send and receive is sent as a bunch of numbers. These numbers
represent the characters you typed or generated. Because the range of standard ASCII is 0 to 127 it only requires 7 bits or 1 byte of
data. For example to send the string cactus.io as ascii it would translate to 99 97 99 116 117 115 46 105 111. Microprocessors only
understand bits and bytes. To it everything is a sequence of bits.
The HTML code is based on the different character sets that can range from a single byte character set such as Latin-1 (ISO-8859-1)
or UTF-8 which uses multiple bytes to represent a character. Using a charcter set such as UTF-8 gives us a much larger range of
character sets.
When using a web browser the web site we are using would normally specify the character set it is using. For example in a HTML5
web page you might see the string <meta charset="utf-8"> in the page source. This tells the browser that the data being sent utilises
the UTF-8 character table.
The HTML code is usually in the format of ©. The & tells the browser that it is a HTML code and not part of a string. The #
after the & tells the browser that the following is an numerical value of a symbol. The ; is to tell the browser that is the end of the
code. In the case of ©, this is the html code that represents the copyright symbol ©.
Go to the Resources Toolbox for a range of HTML Code tables
16 10 DLE
17 11 DC1
18 12 DC2
19 13 DC3
20 14 DC4
21 15 NAK
22 16 SYN
23 17 ETB
24 18 CAN
25 19 EM
26 1A SUB
27 1B ESC
28 1C FS
29 1D GS
30 1E RS
31 1F US
Ascii Hex Symbol
32 20 (Space)
33 21 !
34 22 "
35 23 #
36 24 $
37 25 %
38 26 &
39 27 '
40 28 (
41 29 )
42 2A *
43 2B +
44 2C ,
45 2D -
46 2E .
47 2F /
ASCII Stands for American Standard Code for Information Interchange (pronounced 'as-key'). This is a standard set of characters
understood by all computers, consisting mostly of letters and numbers plus a few basic symbols such as $ and %. Which employs the
128 possible 7-bit integers to encode the 52 uppercase and lowercase letters and 10 numeric digits of the Roman alphabet, plus
punctuation characters and some other symbols. The fact that almost everyone agrees on ASCII makes it relatively easy to
exchange information between different programs, different operating systems, and even different computers.
It also means you can easily print basic text and numbers on any printer, with the notable exception of PostScript printers. If you are
working in the MacWrite word processing application on the Mac and you need to send your file to someone who uses WordStar on
the PC, you can save the document as an ASCII file (which is the same as text-only). After you transfer the file to the PC (on a disk or
via a cable or modem),the other person will be able to open the file in WordStar.
In ASCII, each character has a number which the computer or printer uses to represent that character. For instance, a capitalAis
number 65 in the code. Although there are 256 possible characters in the code, ASCII standardizes only 128 characters, and the first
32 of these are "control characters," which are supposed to be used to control the computer and don't appear on the screen. That leaves
only enough code numbers for all the capital and lowercase letters, the digits, and the most common punctuation marks.
Another ASCII limitation is that the code doesn't include any information about the way the text should look (its format). ASCIIonly
tells you which characters the text contains. If you save a formatted document asASCII,you will lose all the font formatting, such as
the typeface changes, the italics, the bolds, and even the special characters like ©, TM, or ®. Usually carriage returns and tabs are
saved.
Unlike some earlier character encodings that used fewer than 7 bits, ASCII does have room for both the uppercase and lowercase
letters and all normal punctuation characters but, as it was designed to encode American English it does not include the accented
characters and ligatures required by many European languages (nor the UK pound sign £). These characters are provided in some 8-bit
EXTENDED ASCII character sets, including ISO LATIN 1 or ANSI 1, but not all software can display 8-bit characters, and some
serial communications channels still remove the eighth bit from each character. Despite its shortcomings, ASCII is still important as
the 'lowest common denominator' for representing textual data, which almost any computer in the world can display.
The ASCII standard was certified by ANSI in 1977,and the ISO adopted an almost identical code as ISO 646.
ASCII is important because it is our link between our computer screen and our computer hard drive, and that link is now the same
between all computers.
All computers speak in binary, a series of 0 and 1. However, just like English and Spanish can use the same alphabet but have
completely different words for similar objects, computers also had their own version of languages. ASCII is used as a method to give
all computers the same language, allowing them to share documents and files.
ASCII is an acronym that stands for American Standard Code for Information Interchange.
What are ASCII tables used for?
ASCII tables are well known in computer circles because they are the babblefish that works between computer hard drives and
humans.
Babblefish, if you don’t know, is a fish from Hitchhiker’s Guide of the Galaxy that can be put in your ear to translate alien languages.
Having the common tables in ASCII was important for computers to be able to talk to each other.
Hard drives store information on magnets (or transistors), that only have two states, on and off. ASCII tables are how we go from a set
of eight 0s and 1s (or a byte of data) to the letter “a” or “A”, or the number “4”. The tables are commonly used across all computer
systems, which allows my computer to read word documents written on your computer, even if I use a PC and you use a Mac – and
no, it was not always like that! The tables include the ASCII alphabet, ASCII binary, ASCII symbols and more!
We just answered why is ASCII important, and what is ASCII used for. If you want to learn more about ASCII tables and the binary
language we have two downloadable projects for you – stamping your initials in binary and writing I Love You in binary. Both
projects have an ASCII alphabet table to convert the binary to letters.
Check out the video below to learn even more about how ASCII tables are used when computers read and write data in binary. Learn
binary and imaging in our newest Teachers Pay Teachers unit! There is a great binary imaging sheet that you can even download for
FREE!
Share this post:
Ascii control codes (control characters, C0 controls)
The following document lists the control codes (control characters) in Ascii and in newer character code
standards like Unicode, which generally try to be compatible with Ascii in the Ascii code range (positions 0
through 127).
ctl-C 3 3 ETX END OF TEXT A transmission control character which terminates a text.
code pos. Unicode Description in C0 of ISO 646
ctl-N 14 E SO SHIFT OUT A control character which is used in conjunction with SHIFT IN
and ESCAPE to extend the graphic character set of the code. It
code pos. Unicode Description in C0 of ISO 646
ctl-Q 17 11 DC1 DEVICE CONTROL A device control character which is primarily intended for
ONE turning on or starting an ancillary device. If it is not required
code pos. Unicode Description in C0 of ISO 646
ctl-T 20 14 DC4 DEVICE CONTROL A device control character which is primarily intended for
FOUR turning off, stopping or interrupting an ancillary device. If it is
code pos. Unicode Description in C0 of ISO 646
not required for this purpose, it may be used for any other
device control function not provided by other DCs.
ctl-] 29 1D GS GROUP A control character used to separate and qualify data logically;
code pos. Unicode Description in C0 of ISO 646
Notes:
The first column shows the widely used "control-something" name used for control codes. It relates to
the fact that on a keyboard, it is often possible to generate a control code using the control (Ctrl, Ctl)
key and a normal key.
The column C0 of ISO 646 quotes the definition in that document, with typos fixed, and with references
to characters and code positions changed to use Unicode names and modern terms.
Historical table
The following table lists the original names of Ascii control codes as defined in 1963.
0 0 NULL Null/Idle
6 6 RU "Are you...?"
10 A LF Line feed
12 C FF Form feed
13 D CR Carriage return
14 E SO Shift out
15 F SI Shift in
18 12 DC2
code pos. Ascii 1963
19 13 DC3
21 15 ERR Error
24 18 S0 Separator (information)
25 19 S1
26 1A S2
27 1B S3
28 1C S4
29 1D S5
code pos. Ascii 1963
30 1E S6
31 1F S7
Ascii 1963 assigned code position 126 to the ESC code. Later ESC was moved to position 27, and position
126 was assigned to tilde (~). Similarly ACK was moved from 124 to 6, making room for vertical
line (vertical bar, |).
1. ASCII control characters. The ASCII control character area covers code positions 0–31 (hex
00–1F). This area is also called the C0 set. Two additional controls appear at 32 and 127 (hex 20
and 7F). The ASCII control characters cover a wide range of uses, such as text layout, transmission
and device control, and more. More
2. C1 control characters. C1 covers positions 128-159 (hex 80-9F). C1 is primarily for displays
and printers. This set is related to ANSI escape sequences and VT100. More
3. ISO 8859 special characters. Two special characters, NBSP and SHY, are from ISO 8859.
They are also used in Windows and Unicode. They appear at 160 and 173 (hex A0 and AD). More
Note: These control character sets are not the only control characters ever used. Other
C0 and C1 sets do exist. Alternative sets were defined for special uses. In them, a part
of the standard C0/C1 controls have been deleted or replaced by new controls. Even
totally different alternative sets exist. Alternative control characters are not discussed
in this article. One can find them in the International Register of Coded Character Sets.
C0 = positions 0–31. Origin with ASCII and ISO 646 character sets. Characters SP and DEL appear
together with C0.
The first group of control characters originates from ASCII. These characters consist of a set called
C0 and two additional characters. The C0 set is in locations 0 to 31. Two additional ASCII
characters, SP and DEL, fall outside the C0 area, but they are closely related to the C0 set. All of
these characters are defined by the same standards.
This set of control characters covers many uses. There are "Format Effectors" that control the
appearance of plain text. There are "Transmission Controls" for use with transmission protocols
and "Device Controls" to start, operate and stop auxiliary devices. There are "Information
Separators" that delimit various pieces of data. Other controls exist for producing alerts, filling a
media, indicating end of media, and for dealing with errors. There are even controls to create new
characters and controls. The C0 set was defined with perforated tape, punched cards and
typewriter-like devices in mind. Devices have changed since then, but the C0 controls have
survived.
The first version of ASCII was released in 1963. Like the ASCII of today, the 1963 version covered
some letters and symbols, as well as control characters. While many of those 35 control characters
were similar to those of modern ASCII, some were different. ASCII-1963 had some serious
shortcomings, such as no support for lower case letters. It quickly turned out that the standard
must be revised. Today, ASCII-1963 is practically forgotten. Since ASCII-1963 deviates a lot from
later ASCII versions in the control character area too, we will not go any deeper into it.
The next revision was ASCII-1965. This version, although formally accepted, was not published.
Another revision was going to take place. ASCII as we know it is based on the ASCII-1967 standard
(USAS X3.4-1967). This version was an important milestone. It was already very close to the
version that then became widely used.
In 1968 ASCII was slightly updated and released as USAS X3.4-1968 (later retronamed as ANSI
X3.4-1968). The actual updates were very small, only adding an option to use the character LF as
a "newline", and designating ASCII and USASCII as the names of the standard. (Later on, the name
USASCII was dropped, leaving ASCII as the official name.)
ASCII-1968 became immensely popular. Almost all of today's computer systems use ASCII or one
of its descendants. (A notable exception is EBCDIC used on IBM mainframes, very different from
ASCII.) The Internet is based on ASCII-1968 as well.
ASCII-1968 defined the 34 control characters that remained: the C0 set, SP and DEL. Included was
a short description of the intended functionality of each control character. These definitions also
made themselves to RFC 20 word for word. Most of these definitions have remained materially
unchanged for decades. Later standards have updated the text, but the basic functionality is still
the same. This is what comes to standards. Non-standard use is common and often contrary to the
standards.
When ASCII emerged, computing equipment was quite different from the equipment that ASCII
was going to be popularized with. Computers were regularly operated through punched cards,
perforated tape and teletypewriters (TTYs). TTYs were typewriter-like devices, which were used as
interactive computer terminals. Instead of a monitor they produced output on paper. The ASCII
control characters were naturally designed considering the devices of those days. Since then, new
devices such as monitors have emerged. It hasn't always been that simple to accommodate the
control characters to the newer devices. Despite the challenges, the control characters of the
1960s are still with us.
ISO 646. ASCII evolved to an ISO standard, which is known as ISO 646. The first version came out
in 1967. ISO 646 is the "international edition" of ASCII, with a few differences. Despite the
differences, these standards were closely related. ISO 646 allowed national variants to support the
national characters required for each country. The US national variant was ASCII. Several other
national variants were released to support accented letters (à, ü and the like) and other symbols.
The ISO variants including ASCII were a common way to express text in the 1970s and 1980s.
As to the control characters, the ASCII control characters set also appeared in ISO 646. The
functionality of the control characters remained quite intact, even though the definitions were
updated.
More standards. ISO 646 was also released as ECMA-6. The control characters appear in ECMA-6
very similar to those of ISO 646.
A part of the C0 codes were further refined in other standards. SI, SO and ESC appeared as
character set extension controls in ANSI X3.41, ISO 2022 and ECMA-35. These characters became
widely used to invoke additional character sets. The Transmission Control characters (T 1 to T10)
appeared as ISO 1745 in 1975, which gave detailed description of where and how they should be
used. How widely ISO 1745 was actually used in transmission is another question.
ASCII was later updated in 1977 and again in 1986 to be in conformance with ISO 646. The control
characters in ASCII-1986 and ISO 646/ECMA-6 are very similar, even though minor differences do
exist.
The current ISO and ECMA versions, namely ISO 646:1991 and ECMA-6:1991, no longer define the
C0 control characters. The control characters didn't go away, however. They now appear in
ISO/IEC 6429:1992 and ECMA-48:1991, respectively. Simply put, the C0 set was lumped together
with other control characters, the C1 set, which follows below.
As to some specific control characters, the current detailed definitions of SI, SO and ESC can be
found in ANSI X3.41, ISO 2022 and ECMA-35. The current details for the Transmission Control
characters (TC1 to TC10) appear in the old ISO 1745 from 1975.
Even though the history of the various standards related to the ASCII control codes may sound
unnecessarily complicated, the standard functionality of the characters has not changed
dramatically. It's still mostly the same as back in 1967. This is what comes to standards. The
practice is totally different. Some control characters are indeed commonly used the standard way.
On the other hand, many are used contrary to the standards, or simply ignored. It's not uncommon
to find control characters forbidden in data. Control characters can have unwanted or unknown
side-effects. The easiest way for programmers to deal with them is to shut their eyes or deny such
characters altogether.
C1 control characters
The C1 set appeared in the late 1970s. It is primarily designed for controlling display and printer
devices, even though some of the controls warrant other uses as well. The C1 set is intended for
use with the C0 set.
The C1 set includes "Format effectors" that control horizontal and vertical movement when
displaying or printing. There are "Presentation controls" for defining line-break behavior. There are
"Area definition" controls for form filling. There are "Introducers" and "Shift Functions" to support
extra controls and characters. Additional controls exist for sending command strings and setting
an indicator. Some of the controls were intended to cover for shortcomings in the C0 set. Some
controls were reserved: 2 controls are for private use, while 4 controls were (and still are) reserved
for future standardization.
The C1 set occupies positions 128–159 in 8-bit environments. There are also escape codes to use
the C1 set on 7-bit systems. The respective escape codes (ESC char) are given in the C1
list further below.
History of C1
In 1979 ANSI released additional controls for use with ASCII (ANSI X3.64). This came to be known
as the C1 set. A similar set was also released as ECMA-48. According to ANSI, the C1 controls were
intended for input/output control of two-dimensional character-imaging devices, including
interactive terminals of both the cathode ray tube and printer types, as well as output to microfilm
printers.
A bit later, in 1983, the C1 set was standardized as ISO 6429. Standard-wise, the C1 set has been
volatile. Both ISO 6429 and ECMA-48 were updated several times. New control characters were
added and definitions updated. One of the C1 characters (IND) was eventually deprecated and
removed.
The standards actually cover more control codes than those that fit in the C1 area. These
additional controls are used via control sequences (escape sequences). The sequences are beyond
the subject of this article. Let it suffice that the sequences are an important part of the standards
that should be used together with the C1 controls. The sequences, together with C1, are also
known as VT100 and ANSI escape sequences.
Current status of C1
The current standards for C1 are ISO/IEC 6429:1992 and ECMA-48:1991. These standards now
define both the C0 and C1 control characters.
Unicode allows the use of C1 (and C0 too). In fact, the C1 area has been entirely reserved for
control codes in Unicode. On the contrary, the (somewhat outdated) DOS and Windows
codepages, i.e. character sets, have not reserved space for C1. Instead, they have included
additional graphic characters in the C1 area. This doesn't prevent the use of C1 controls on DOS
and Windows, though.
In practice, the C1 control characters are not very common. They are specialized codes for special
applications.
Two characters in ISO 8859 are of interest to us: Non-Breaking Space (NBSP) and Soft Hyphen
(SHY). They both have control character like properties, even though they are not actually called
control characters in ISO 8859.
NBSP appears in position 160 (hex A0) and SHY is 173 (hex AD). The same positions, and roughly
the same meanings too, have been adopted to many of the Windows codepages and Unicode.
Note: ISO 8859-8 Latin/Hebrew defines two additional special characters, namely LRM
(left-to-right mark) and RLM (right-to-left mark). These characters are not universal in
ISO 8859, but specific to Hebrew. Since LRM and RLM were not used in any other ISO
8859 character set, and since they do not appear in Unicode at the same positions,
they are not further presented in this article.
Several current standards include NBSP and SHY. They appear at the same positions in all of the
following:
The Unicode Standard provides for the intact interchange of these code points, neither adding to
nor subtracting from their semantics. The semantics of the control codes are generally determined
by the application with which they are used. However, in the absence of specific application uses,
they may be interpreted according to the control function semantics specified in ISO/IEC
6429:1992. (Unicode 9.0 p. 822)
Unicode specifies semantics for the following control characters. The semantics appear to be in
line with their original semantics, even though some differences may exist.
The following diagram summarizes the development of character standards. You can see how the
control characters were propagated from ASCII (X3.4) and other standards to Unicode.
Control characters in modern applications
With so many control characters coming from the 1960s and 1970s, are they still useful for
application programmers?
It depends on the application. Generally speaking, one needs control characters to work with old
interfaces or devices. New protocols and file formats tend to use some other mechanism than
control characters. Current formats typically use textual markup such as XML, which has little use
for control characters beyond whitespace. On the device control side, unless you are writing
device drivers, you control devices through operating system calls or library routines rather than
sending them control strings to do tricks.
The following is a subjective list of which characters are still in common use and which ones are
used less. The list is based on experience writing application software for Windows and DOS.
Some frequently used characters, especially in a special field, may not have been mentioned. If
you know frequent current uses for any of the characters, let us know.
Many of the control characters only appear rarely. How did this affect the space efficiency of 7-bit
and 8-bit character sets? Instead of reserving space for control characters, it was possible to reuse
these areas for additional graphics. This was actually done by DOS, Windows and Mac, all of which
assigned graphic characters to the control character areas. Unicode chose to be different in this
respect. Since its code space is much larger than 128 or 256, it was possible to reserve the C0/C1
areas entirely for control characters. This has helped the control characters to survive, if not in
practical use, then at least in various code charts and lists.
In this article the focus is on the programmatic features of control characters. Less focus is put on
the use of keyboard shortcuts.
The list shows key presses that (often) produce the control character on the keyboard. In addition,
C-style escape sequences (\c) are provided where available, as are special constants supported by
Visual Basic: classic version and Visual Basic .NET.
The last column lists mnemonics and graphic symbols. The symbols (in black) have been
standardized, but they have fallen into disuse. The 2-letter mnemonics are standardized for the
ASCII section. Additional 2-letter mnemonics for the C1 and ISO 8859 sections are taken from RFC
1345, which is not a standard, but is frequently referred to in this context.
DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS
10 11 12 13 14 15 16 17 18 19 1A 1B 1C
SP
20
....
PAD HOP BPH NBH IND NEL SSA ESA HTS HTJ VTS PLD PLU
80 81 82 83 84 85 86 87 88 89 8A 8B 8C
C1
DCS PU1 PU2 STS CCH MW SPA EPA SOS SGCI SCI CSI ST
90 91 92 93 94 95 96 97 98 99 9A 9B 9C
NBSP
8859
A0
Character list
*) The 2-character mnemonics for the ASCII set are from ANSI X3.32, ISO 2047 and ECMA-17. So
are also the graphic symbols. The symbols are outdated and rarely used. A couple of the symbols
also have alternative forms.
1 $01 SOH Start of Heading — TC1 Transmission control character 1 001 0/1 SH
Note: SOH, along with STX and ETX, was intended for data transmission. It is
not intended for marking a heading in a document.
2 $02 STX Start of Text — TC2 Transmission control character 2 002 0/2 SX
3 $03 ETX End of Text — TC3 Transmission control character 3 003 0/3 EX
Note: ETX may be used to call for reply from a slave station after a message
has been sent. ETX is also commonly used to terminate an interactive process
(keyboard: Ctrl + C ).
Ctrl + Break on PC keyboard produces this character code.
4 $04 EOT End of Transmission — TC4 Transmission control character 4 004 0/4 ET
Note: EOT can be used to end or abort a transmission. It can also be a reply to
indicate inability to receive further messages. EOT (keyboard: Ctrl + D ) is even
used as an End-Of-File control in a Unix shell session.
^E Requests a response from a remote station. The response may include station
identification or status. ENQ can be used as a "Who Are You" (WRU) to identify
a remote station, especially after a new connection has been established.
Note: ACK can indicate that a slave station has received a message correctly
and is ready to receive more.
Note: BEL is the only control character with an audible effect. It has been used
to ring a bell (indeed) or produce a beep sound. A visual alarm is also possible.
In Unicode, this control character is abbreviated BEL but named ALERT, while
the name BELL is confusingly used for a graphic character (🔔).
Note: Contrary to the standards, BS has been used as a combined "move back
and delete" operation to remove the previous character. This is not the
standard meaning of BS, however. BS is defined as a non-destructive "move
back" or "move left" operation, similar to a backspace in mechanical
typewriters. To delete the previous character, BS should be followed by DEL. On
paper tape the result would be the previous character being completely
punched out (erased). BS followed by another character would strike two
characters in the same position. Overstriking was a way to produce combined
characters. This option was intended to internationalize ASCII. A letter followed
by BS followed by a diacritic symbol would produce an accented letter. As an
example, u BS ^ would produce û. Several ASCII characters (" ' ` ^ ~ ,) were
indeed defined to be used as diacritic symbols. Overstriking could also be
suitable with other characters, such as for underlining with the "_" character or
printing a slash "/" over "=" to produce "not equal". It could even be used to
achieve a strike-through effect (perhaps with -, / or X) to indicate removed text.
A boldface effect could be achieved by striking the same character several
times at the same position.
Overstriking was a useful option with printing devices, but displays hardly
support it. With the advent of more capable character sets and formatting
techniques overstriking can be considered outdated. ASCII-1986 does not
require overstriking capabilities and suggests that overstriking may be
proscribed in the future. ISO 8859 explicitly forbids overstriking.
Even though the standards don't set a universal tab width, a typical fixed tab
width is 8 columns. Other tab widths, as well as custom tab positions, are used
as well. HT is a simple method of data compression: a single character can
represent several spaces in formatted text.
The ↹ Tab key on the keyboard is consistent with HT in that it usually produces
the code HT. How the HT is treated in each application is another story. In
windowing environments, there are three common alternative uses. Pressing ↹
Tab can either add an HT character into text, indent text (possibly by adding an
appropriate number of spaces or shifting the marginal), or something
completely different: jump to the next field or control in a graphical user
interface. This way the key has been extended to cover more uses than what
HT was originally intended for.
Note: LF, having two alternative functions, has been a major source of
confusion. While LF was initially defined as a "move down" operator, standards
began to allow LF as a newline too. As a result, operating systems differ in their
definition of a newline. A newline is LF on Unix. Operating systems using CR LF
include CP/M, DOS, OS/2 and Windows. Naturally, this caused an
incompatibility. To solve the problem, control characters IND and NEL were
added to the C1 area. This did not solve the issue, resulting in IND being
removed later. ECMA-6:1985 and ASCII-1986 attempted to clarify the situation
by declaring LF deprecated for a newline and recommending CR LF instead.
ECMA-48:1991 no longer allows LF to function as a newline.
In modern use VT must be quite a rare character. As Bob Bemer, one of the
original designers of ASCII, put it: "This is a very dangerous character to use. It
cannot be used directly on any terminal that I know of. Even if it could, the
implementation rules are not supplied unambiguously in the ASCII standard."
\f ^L Advances to the next form or page. Standards differ in what column the
subsequent character position will be in. Originally, ASCII-1968 did not define
the column at all. ISO and ECMA standards declare that FF does not change the
column. ASCII-1977 and ASCII-1986 optionally allow, by agreement, moving to
the first column, as if FF was actually CF FF.
Note: FF has been used as "page break" in text files, "new page" on printers
and "clear the screen" on displays. The situation was originally unclear whether
FF was just a "new page" operator or "new page, move to column 1". ASCII-
1977 and ECMA-6:1985 attempted to clarify the situation by recommending the
use of CR FF. ASCII-1986 even implied that the "new page, move to column 1"
option might be deleted in a future edition of ASCII.
Constant in Visual Basic and VB.NET: vbFormFeed, FormFeed
\r ^M Traditional definition: Moves to the first position on the same line (ASCII, ISO
646, ECMA-6). Newer definition: Moves to the line home position or line limit
position of the same line (ISO 6429, ECMA-48).
See also: LF
Note: SO (Shift Out) is normal name of this control. LS1 (Locking-Shift One) is
used by ECMA-35 and ECMA-48. In those standards, SO is used in 7-bit
environments and LS1 in 8-bit environments. The mechanism to select the
alternative character set(s) was defined in ANSI X3.41, ISO 2022 and ECMA-35.
It includes the use of escape sequences starting with ESC. SO has also been
used on printers to select enlarged characters or another color.
^O Used in conjunction with SO. It may reinstate the standard meanings of the
characters following it.
Note: SI (Shift In) is normal name of this control. LS0 (Locking-Shift Zero) is
used by ECMA-35 and ECMA-48. In those standards, SI is used in 7-bit
environments and LS0 in 8-bit environments. SI has also been used on printers
to select condensed characters or to reset color.
16 $10 DLE Data Link Escape — TC7 Transmission control character 7 020 1/0 DL
Note: DLE is the "escape" character for transmission control. DLE can
potentially be put in the front of a transmission control character (TC1-TC10) to
pass it through "as is" instead of controlling the current transmission. This is
not always the case, though. It is possible to create new transmission control
sequences with DLE in a similar way ESC is used to create escape sequences
for other purposes. Contrary to the standards, ^P has been used as a keyboard
shortcut to echo console activity at the printer.
^T Intended to turn off, stop or interrupt an ancillary device, or for any other
device control function.
22 $16 SYN Synchronous Idle — TC9 Transmission control character 9 026 1/6 SY
23 $17 ETB End of Transmission Block — TC10 Transmission control 027 1/7 EB
character 10
^W Indicates the end of a block of data. Used when data is divided into blocks for
transmission.
Note: ETB, when used to end a block, may call for a reply from a slave station.
^X Indicates that data is in error or should be disregarded. Affects "the data with
which it is sent" (ASCII-1968, ASCII-1977) or "the data preceding it" (ASCII-
1986, ISO 646, ECMA-6, ECMA-48).
Note: There are 2 alternative definitions for the data to be disregarded. The
actual scope of cancellation is undefined by the standards and should be
defined case by case. ^X has been used as a keyboard shortcut to cancel
(delete) the characters on the current line, which use conforms to the
standards.
Note: EM may have been suitable for paper tape or magnetic tape to say "no
more data". Disk file systems use more sophisticated ways to keep track of the
used and unused areas of the medium.
This character is commonly abbreviated EM, except for Unicode, which provides
it as an alias with abbreviation EOM.
Note: When SUB is used as a substitution character, the reverse question mark
symbol seems quite good as its visual representation. Compare SUB to Unicode
U+FFFD REPLACEMENT CHARACTER.
SUB has often been used contrary to the standards. On CP/M and MS-DOS, it
appears as an End-Of-File marker for text files (^Z). On Unix, ^Z is a keyboard
signal to interrupt a foreground process.
On the keyboard, sometimes the Esc key indeed produces the ESC control
character. In windowing environments, the key typically cancels a dialog or an
operation, rather than producing a control character or starting an escape
sequence. This kind of an "escape" is not based on the character standards,
however. The closest ASCII equivalent for canceling a dialog would be CAN, but
since there is no Can key on the common keyboards, it can't be used.
^\ The four information separators (FS, GS, RS and US) are used to separate and
qualify data. Each separator has two alternative names: Information Separator
Four equals File Separator, Information Separator Three equals Group
Separator, Information Separator Two equals Record Separator and Information
Separator One equals Unit Separator. The separators can be used either
hierarchically or in a non-hierarchical manner. When used hierarchically, the
order is US (least inclusive), RS, GS and FS (most inclusive). The content and
length of a file, group, record or unit are not specified by the standards.
FS, when used in a hierarchical order, delimits a data item called a file. It can
also delimit anything else.
29 $1D GS Group Separator — IS3 Information separator 3 035 1/13 GS
^] GS, when used in a hierarchical order, delimits a data item called a group. It
can also delimit anything else.
^^ RS, when used in a hierarchical order, delimits a data item called a record. It
can also delimit anything else.
^_ US, when used in a hierarchical order, delimits a data item called a unit. It can
also delimit anything else.
Moves one character position forwards. Space may also have a function
equivalent to that of an information separator.
Note: Space has a dual nature. It can be classified as both a control character
and a (non-printing) graphic character. SP is similar to a Format Effector. It can
also be used as a fifth Information Separator. Space is sometimes represented
by the symbol ƀ or ␢ (b with a stroke) or ␣ (open box). SP does not belong to
the C0 set.
Spacebar on PC keyboard produces this character code.
Note: DEL is now outdated. It was removed from the latest standards (ECMA-48
in 1991 and ISO 6429 in 1992). The origin of DEL is with perforated paper. On
that, DEL was equal to "all holes punched", which is a way to invalidate an
erroneous character (rubout). In a sense, DEL is similar to NUL, since both
characters mean "nothing". ASCII-1977 suggests the use of DEL as a "time
waster" to accommodate mechanical devices where a carriage return takes
time to execute. ASCII-1986 recommends NUL as a time waster instead of DEL.
DEL does not belong to the C0 set, but is an individual control code.
\xis what you write in a C program to produce the given control character. ^X means you
press Ctrl + X to produce the given control character.
C1 control characters
The C1 control characters work in 8-bit environments. These controls come from 3 related
standards: ANSI X3.64, ISO 6429 and ECMA-48. All of these characters are also available in
Unicode, too. There are three unassigned control characters: PAD, HOP and SGCI. Use was planned
for them in a failed draft DIS 10646, but they were not actually standardized or put to use. Despite
this, one can find these control characters in various C1 lists online, and also as aliases in later
Unicode standards.
†) The 2-character mnemonics for C1 are from RFC 1345. They are not standardized.
ESC @ A reserved control code. Intended for use as PAD Padding Character in draft
DIS 10646, rejected, never standardized (not accepted to ISO 10646).
ESC A A reserved control code. Intended for use as HOP High Octet Preset in draft DIS
10646, rejected, never standardized (not accepted to ISO 10646).
Note: Not part of ISO/IEC 6429 or ECMA-48. Listed as XXX in Unicode.
Note: Roughly equivalent to a soft hyphen except that the means for indicating
a line break is not necessarily a hyphen. Compare to Unicode U+200B ZERO
WIDTH SPACE.
ESC D Moves to the next line keeping the current horizontal position.
Note: According to ECMA-48:1986, IND was provided for use in those cases
where LF was implemented as New Line. IND was deprecated in 1988 and
withdrawn in 1992 from ISO/IEC 6429 (1986 and 1991 respectively for ECMA-
48).
See also: LF RI
133 $85 NEL Next Line 205 8/5 NL
ESC E Moves to the first position of the next line. Alternatively, to line home or line
limit position.
Note: NEL maps to the control character NL (New Line) in the EBCDIC
character set used on IBM mainframes.
See also: LF
ESC F Starts a string of character positions whose contents can be transmitted. The
string ends at EPA (or end of display).
ESC G Ends a string of character positions (started by SPA) whose contents can be
transmitted.
136 $88 HTS Horizontal Tabulation Set, Character Tabulation Set 210 8/8 HS
137 $89 HTJ Horizontal Tabulation with Justification, Character 211 8/9 HJ
Tabulation with Justification
ESC I Moves text to the following tab stop. The text is what comes after the previous
tab stop up to the active position.
Note: This character has several names. ANSI X3.64 originally called it
Horizontal Tabulation with Justify. ISO 6429:1992, ECMA-48:1986 and ECMA-
48:1991 have renamed HTJ as Character Tabulation with Justification.
138 $8A VTS Vertical Tabulation Set, Line Tabulation Set 212 8/10 VS
139 $8B PLD Partial Line Down, Partial Line Forward 213 8/11 PD
ESC K Moves down so that following characters will appear as subscripts. Subscripts
end at the next PLU.
Note: ISO 6429:1992 and ECMA-48:1991 have renamed PLD as Partial Line
Forward. Sample: text PLD subscript PLU text.
140 $8C PLU Partial Line Up, Partial Line Backward 214 8/12 PU
ESC M Moves to the previous line keeping the current horizontal position.
ESC N Used to extend the character set. The next character will be from the currently
chosen G2 set.
Note: For more information see ISO 2022 or ECMA-35. The next character
should be in the decimal range 33-126 or 32-127.
ESC O Used to extend the character set. The next character will be from the currently
chosen G3 set.
Note: For more information see ISO 2022 or ECMA-35. The next character
should be in the decimal range 33-126 or 32-127.
ESC S Notifies that data is ready for transfer from a device (ANSI X3.64), or
establishes the transmit state in the receiving device (ISO 6429, ECMA-48).
Doesn't initiate the actual transmission.
ESC T Ignore the preceding graphic character (and CCH itself too). If the previous
character is a control character or sequence, ANSI X3.64 says it should be
ignored, while ISO 6429 and ECMA-48 leave the action undefined.
150 $96 SPA Start of Guarded Protected Area, Start of Protected Area, 226 9/6 SG
Start of Guarded Area
151 $97 EPA End of Guarded Protected Area, End of Protected Area, End 227 9/7 EG
of Guarded Area
Note: EPA is known as End of Protected Area (ANSI X3.64, ECMA-48:1979), End
of Guarded Protected Area (ISO 6429:1983, ECMA-48:1984) and End of
Guarded Area (ISO 6429:1992, ECMA-48:1986 and ECMA-48:1991).
153 $99 SGCI unassigned, "Single Graphic Character Introducer" 231 9/9 GC
ESC Y A reserved control code. Intended for use as SGCI Single Graphic Character
Introducer in draft DIS 10646, rejected, never standardized (not accepted to
ISO 10646).
ESC Z A reserved control code. The name was standardized as SCI Single Character
Introducer, but the actual functionality was not implemented in the standards.
Note: SCI was to be followed by a single byte, which would represent a control
function or a graphic character. The functions or characters were not defined
in the standards.
ESC ] Starts an operating system control string. The string ends at ST and is
interpreted subject to the operating system.
ESC _ Starts an application program command string. ST will end the command. The
interpretation of the command is subject to the program in question.
ESC X means you press Esc followed by X to produce this control character.
The two special characters, NBSP and SHY, are not really control characters. They are graphic
characters with a special feature. The characters also appear in Unicode. They are included here
for the sake of completeness.
‡) The 2-character mnemonics for NBSP and SHY are from RFC 1345. They are not standardized.
De Hex Char Description Octa Pos ‡)
c l
In HTML you can write or   to add a no-break space to a web page.
See also: SP
Indicates an intraword break point for use when a word must be broken across
lines. The visual rendering either is a hyphen (ISO 8859) or varies (Unicode).
In HTML you can write ­ or ­ to add a soft hyphen to a web page.
SP
20
....
PAD HOP BPH NBH IND NEL SSA ESA HTS HTJ VTS PLD PLU
80 81 82 83 84 85 86 87 88 89 8A 8B 8C
C1
DCS PU1 PU2 STS CCH MW SPA EPA SOS SGCI SCI CSI ST
90 91 92 93 94 95 96 97 98 99 9A 9B 9C
NBSP
8859
A0
Categories
A summary of character categories. Mostly based on ANSI X3.64, ISO 6429, ECMA-6 and ECMA-48.
Translations
The translated terms are taken from the given standards. Several alternative translations may
exist.
SOH Start of Heading DET début d'en-tête НЗ начало заголовка comienzo de Anfang des otsikon alku
encabezamiento Kopfes
STX Start of Text DTX début de texte НТ начало текста comienzo de Anfang des tekstin alku
texto Textes
ETX End of Text FTX fin de texte КТ конец текста fin de texto Ende des Textes tekstin loppu
EOT End of FTR fin de transmission КП конец передачи fin de Ende der Über- tekstin loppu
Transmission transmisión tragung
ENQ Enquiry DEM demande КТМ кто там? pregunta Stationsauf- kysely
forderung
ACK Acknowledge ACC accusé de réception ДА подтверждение acuse de recibo Positive Rück- kuittaus
[positif] meldung
LF Line Feed PAL changement de ligne ПС перевод строки cambio de Zeilenvorschub riviaskel
renglón
FF Form Feed SDP saut de page, page ПФ перевод формата página siguiente Formular- sivun vaihto
suivante vorschub
CR Carriage Return RC retour de chariot ВК возврат каретки retorno del carro Wagenrücklauf vaunun
palautus
DLE Data Link Escape ÉCT échappement AP1 авторегистр один escape de Datenüber- ohjaus-
transmission enlace de datos tragungsum- koodin poik-
schaltung keus
DC1 Device Control 1 CD1 commande d'appareil СУ1 символ устройства control de Gerätesteuerung laitteen
un один dispositivo uno 1 ohjaus 1
DC2 Device Control 2 CD2 commande d'appareil СУ2 символ устройства control de Gerätesteuerung laitteen
deux два dispositivo dos 2 ohjaus 2
DC3 Device Control 3 CD3 commande d'appareil СУ3 символ устройства control de Gerätesteuerung laitteen
trois три dispositivo tres 3 ohjaus 3
DC4 Device Control 4 CD4 commande d'appareil СУ4 символ устройства control de Gerätesteuerung laitteen
(Stop) quatre четыре dispositivo 4 ohjaus 4
cuatro
NAK Negative ACN accusé de réception НЕТ отрицание acuse de recibo Negative Rück- kielteinen
Acknowledge négatif negativo meldung kuittaus
SYN Synchronous Idle SYN synchronisation СИН синхронизация reposo síncrono Synchroni- tahditus
sierung
ETB End of FBT fin de bloc de КБ конец блока fin de bloque de Ende des Über- jaksonsiirron
Transmission transmission transmisión tragungsblocks loppu
Block
EM End of Medium FS fin de support КН конец носителя fin del medio Ende der Auf- tietovälineen
físico zeichnung loppu
ESC Escape ÉCH échappement АР2 авторегистр два escape Umschaltung koodin
poikkeus
BPH Break Permitted API arrêt permis ici РПС разрешение переноса corte permitido
Here строки aquí
NBH No Break Here PAI aucun arrêt ici ЗПС запрет переноса corte no
строки permitido aquí
PLD Partial Line Down IPav interligne partiel avant CCB смещение строки avance de línea
вперед parcial
PLU Partial Line Up IPar interligne partiel CCH смещение строки retroceso de
arrière назад línea parcial
SS2 Single Shift Two RU2 remplacement unique ПЕ2 переключатель cambio
deux единичный два individual dos
SS3 Single Shift Three RU3 remplacement unique ПЕ3 переключатель cambio
trois единичный три individual tres
PU1 Private Use One UP1 usage privé un ЧИ1 частное использова-
ние один
PU2 Private Use Two UP2 usage privé deux ЧИ2 частное использова-
ние два
STS Set Transmit MMT mise en mode УСП установка состояния
State transmission передачи
EPA End of Guarded FZP fin de zone protégée КСО конец сохраняемой
Protected Area области
NBSP No-Break Space ESP espace insécable непрерывающий espacio anticorte yhdistävä
INS пробел välilyönti *
SHY Soft Hyphen CDN trait d'union гибкий дефис guión de corte pehmeä
conditionnel programable tavuviiva *
* Finnish terms marked with an asterisk are not from any standard, but from
recommendation Eurooppalaisen merkistön merkkien suomenkieliset nimet.
Character index
ACK Acknowledge NAK Negative Acknowledge
DLE Data Link Escape SGCI "Single Graphic Character Introducer" (unassigned)
EN Enquiry SI Shift In
Q
SO Shift Out
EOT End of Transmission
SOH Start of Heading
EPA End of Guarded Protected Area
SOS Start of String
ESA End of Selected Area
SP Space
ESC Escape
HOP "High Octet Preset" (unassigned) TC3 Transmission control character 3 (End of Text)
HTJ Horizontal Tabulation with Justification TC5 Transmission control character 5 (Enquiry)
IS1 Information separator 1 (Unit Separator) TC8 Transmission control character 8 (Negative Acknowledge)
IS2 Information separator 2 (Record Separator) TC9 Transmission control character 9 (Synchronous Idle)
IS3 Information separator 3 (Group Separator) TC10 Transmission control character 10 (End of Transmission
Block)
IS4 Information separator 4 (File Separator)
US Unit Separator
LF Line Feed
VT Vertical Tabulation
LS0 Locking-Shift Zero (Shift In)
VTS Vertical Tabulation Set
LS1 Locking-Shift One (Shift Out)
XOFF Device Control 3
MW Message Waiting
XON Device Control 1
Sources
ASA standard X3.4-1963: American Standard Code for Information Interchange. Note: ASCII-
1963.
USAS X3.4-1967: USA Standard Code for Information Interchange. United States of America
Standards Institute, New York, USA, 1967. Note: ASCII-1967.
USAS X3.4-1968: USA Standard Code for Information Interchange. Reprinted as NIC 11246 in
Feinler & Postel (ed.): Arpanet Protocol Handbook. NIC 7104 Rev. Jan 1978. ADA-052 594.
Network Information Center, Menlo Park, California, USA. Note: ASCII-1968.
ANSI X3.4-1977: American National Standard Code for Information Interchange. American
National Standards Institute, Inc, New York, USA, 1977. Also reprinted in McGraw Hill's
Compilation of Data Communication Standards, edition II, McGraw-Hill, 1982. Note: ASCII-
1977.
ANSI X3.4-1986: Coded Character Sets – 7-bit American National Standard Code for
Information Interchange. American National Standards Institute, Inc, New York, USA,
1986. Note: ASCII-1986.
ANSI X3.32-1973: Graphic Representation of the Control Characters of American National
Standard Code for Information Interchange. Reprinted in McGraw Hill's Compilation of Data
Communication Standards, edition II, McGraw-Hill, 1982.
ANSI X3.64-1979: Additional Controls for Use with American National Standard Code for
Information Interchange. American National Standards Institute, Inc, New York, USA, 1979.
Bemer, R.W.: Inside ASCII. Best of Interface Age, Volume 2: General Purpose Software.
Oregon, USA (1980).
Bies, Lammert: ASCII character map.
Digital Research: An Introduction to CP/M Features and Facilities, version 1.3, 1976.
ECMA-6: 7-bit Coded Character Set, 4th edition 1973, 5th edition 1985.
ECMA-17: Graphic Representation of the Control Characters of the ECMA 7-Bit Coded
Character Set for Information Interchange, 1st edition (withdrawn).
ECMA-35: Character Code Structure and Extension Techniques, 6th edition.
ECMA-48: Control Functions for Coded Character Sets, 2nd, 3rd, 4th and 5th edition.
Gerstung, Olaf: Tabellen — Verschiedenes. Bedeutung der Steuerzeichen im ASCII und nach
DIN 66003.
GOST 34.301-91: Information technology. 7-bit and 8-bit coded character sets. Control
functions – ГОСТ 34.301-91 (ИСО 6429-88) Информационная технология. 7-битные и 8-
битные кодированные наборы символов. Управляющие функции.
GOST 34.302.2-91: Information technology. 8-bit single-byte coded graphic character sets.
Latin alphabet No. 2 – ГОСТ 34.302-91 (ИСО 8859/2-87) Информационная технология.
Наборы 8-битных однобайтовых кодированных графических символов. Латинский
алфавит № 2.
Helsingin yliopiston yleisen kielitieteen laitos: Eurooppalaisen merkistön merkkien
suomenkieliset nimet, 2. laitos, toukokuu 2004.
ISO / R 646-1967 (E): 6 and 7-bit coded character sets for information processing
interchange, 1st edition December 1967. International Organization for Standardization,
Switzerland.
ISO 646-1973 (E): 7-bit coded character set for information processing interchange. ISO
Standards Handbook 1: Information transfer, 1st edition, 1977. Also reprinted in McGraw
Hill's Compilation of Data Communication Standards, edition II, McGraw-Hill, 1982.
ISO 646:1991: Information technology – 7-bit coded character set for information processing
interchange.
ISO 2022-1973 (E): Code extension techniques for use with the ISO 7-bit coded character set.
ISO Standards Handbook 1: Information transfer, 1st edition, 1977.
ISO 2047-1975 (E): Information processing – Graphical representations for the control
characters of the 7-bit coded character set. ISO Standards Handbook 1: Information transfer,
1st edition, 1977.
ISO/IEC 6429:1992 (E): Information technology – Control functions for coded character sets.
ISO 1745-1975 (E): Information processing – Basic mode control procedures for data
communication systems. Reprinted in McGraw Hill's Compilation of Data Communication
Standards, edition II, McGraw-Hill, 1982.
ISO/IEC 8859: Information technology – 8-bit single-byte coded graphic character sets. Note:
Mostly ISO/IEC 8859-1:1998: 8-bit single-byte coded graphic character sets -- Part 1: Latin
alphabet No. 1.
ISO-IR 001: The set of control characters of the ISO 646. Note: ISO-IR 001 deviates slightly
from ISO 646-1973 in wording. DEL missing.
ISO-IR 077: C1 Control Character Set of ISO 6429-1983.
Jennings, Tom: An annotated history of some character codes, revised 29 October, 2004.
RFC 20: ASCII format for Network Interchange. Note: Identical to USAS X3.4-1968 (ASCII-
1968). Missing Appendix A–D.
RFC 1345: Character Mnemonics & Character Sets.
SFS 4017: Tietojen vaihdossa käytettävä 7-bittinen koodi – 7-bit coded character set for
information processing interchange. Suomen standardisoimisliitto, Helsinki, Finland, 1977.
UIT-T T.50 (04/92): Alfabeto internacional de referencia, (anteriormente alfabeto
internacional N.° 5 o IA5) – Tecnología de la información - Juego de caracteres codificado de
siete bits para intercambio de información.
UIT-T T.51 (09/92): Juegos de caracteres codificados basados en el alfabeto latino para los
servicios de telemática.
UIT-T T.53 (04/94): Funciones de control codificadas mediante caracteres para los servicios
telemáticos.
Unicode, Inc.: Unicode 5.0, section française.
Unicode, Inc.: The Unicode Standard, version 9.0.0, 2016.
Unicode, Inc.: Unicode Character Database, NameAliases-9.0.0.txt.
Whistler, Ken: Why Nothing Ever Goes Away (was: Re: Acquiring DIS 10646). Unicode Mail
List, 5 Oct 2015.
Wikipedia: ASCII.
Wikipedia: C0 and C1 control codes.
Wikipedia: Control character.
Wikipedia: Newline.
Wikipedia: Software flow control.
Special thanks for help to Douglas A. Kerr, the principal author and editor of the published
standards document of the first complete version of ASCII.
Last updated in August 2016: Unicode 9.0, CP/M, additional details on PAD, HOP and SGCI.