ASCII stands for American Standard Code for Information Interchange. It is a character encoding standard that has been a foundational element in computing for decades.
- Uses 7 bits to encode 128 characters (0–127); modern usage often stores them in 8‑bit bytes with the high bit set to 0.
- There are 95 codes (32–126) for printable characters including space, digits, uppercase/lowercase English letters, punctuation, and symbols
- And 33 control characters that are non‑printing (0–31, 127) for formatting and control (e.g., NUL, LF, CR)
- Serves as the basis for Unicode's first 128 code points; still fundamental in programming, data exchange, and legacy systems

Historical Background
ASCII has a rich history, dating back to its development in the early 1960s. Originating from telegraph code and Morse code, ASCII emerged as a standardized way to represent characters in computers, facilitating data interchange.
ASCII Encoding Standards
ASCII Character Set
The ASCII character set includes standard characters such as letters, numbers, punctuation, and control characters. Each character is assigned a unique seven-bit binary code.
Decimal | Character | Description |
---|
0 | NUL | Null |
---|
1 | SOH | Start of Header |
---|
2 | STX | Start of Text |
---|
3 | ETX | End of Text |
---|
4 | EOT | End of Transmit |
---|
5 | ENQ | Enquiry |
---|
6 | ACK | Acknowledge |
---|
7 | BEL | Bell |
---|
8 | BS | Backspace |
---|
9 | HT | Horizontal Tab |
---|
10 | LF | Line Feed |
---|
11 | VT | Vertical Tab |
---|
12 | FF | Form Feed |
---|
13 | CR | Carriage Return |
---|
14 | SO | Shift Out |
---|
15 | SI | Shift In |
---|
... | ... | ... |
---|
32 | (space) | Space |
---|
33 | ! | Exclamation Mark |
---|
34 | " | Quotation Mark |
---|
... | ... | ... |
---|
65 | A | Uppercase A |
---|
66 | B | Uppercase B |
---|
... | ... | ... |
---|
97 | a | Lowercase a |
---|
98 | b | Lowercase b |
---|
... | ... | ... |
---|
127 | DEL | Delete |
---|
ASCII Control Characters
In addition to printable characters, ASCII includes control characters for formatting and controlling devices. These include characters like carriage return and line feed.
Decimal | Character | Description |
---|
0 | NUL | Null |
---|
1 | SOH | Start of Header |
---|
2 | STX | Start of Text |
---|
3 | ETX | End of Text |
---|
4 | EOT | End of Transmit |
---|
5 | ENQ | Enquiry |
---|
6 | ACK | Acknowledge |
---|
7 | BEL | Bell |
---|
8 | BS | Backspace |
---|
9 | HT | Horizontal Tab |
---|
10 | LF | Line Feed |
---|
11 | VT | Vertical Tab |
---|
12 | FF | Form Feed |
---|
13 | CR | Carriage Return |
---|
14 | SO | Shift Out |
---|
15 | SI | Shift In |
---|
ASCII Extended Characters
While the original ASCII set comprises 128 characters, extended ASCII introduces an additional 128 characters, accommodating symbols and characters for different languages.
Decimal | Character | Description |
---|
128 | Ç | Latin Capital Letter C-cedilla |
---|
129 | ü | Latin Small Letter U with Diaeresis |
---|
130 | é | Latin Small Letter E with Acute |
---|
131 | â | Latin Small Letter A with Circumflex |
---|
132 | ä | Latin Small Letter A with Diaeresis |
---|
133 | à | Latin Small Letter A with Grave |
---|
134 | å | Latin Small Letter A with Ring Above |
---|
... | ... | ... |
---|
255 | ÿ | Latin Small Letter Y with Diaeresis |
---|
ASCII Table
A comprehensive ASCII table organizes characters and their corresponding binary, decimal, and hexadecimal representations.
Decimal | Hex | Binary | Character | Description |
---|
0 | 00 | 00000000 | NUL | Null |
---|
1 | 01 | 00000001 | SOH | Start of Header |
---|
2 | 02 | 00000010 | STX | Start of Text |
---|
3 | 03 | 00000011 | ETX | End of Text |
---|
4 | 04 | 00000100 | EOT | End of Transmit |
---|
5 | 05 | 00000101 | ENQ | Enquiry |
---|
6 | 06 | 00000110 | ACK | Acknowledge |
---|
7 | 07 | 00000111 | BEL | Bell |
---|
8 | 08 | 00001000 | BS | Backspace |
---|
9 | 09 | 00001001 | HT | Horizontal Tab |
---|
10 | 0A | 00001010 | LF | Line Feed |
---|
11 | 0B | 00001011 | VT | Vertical Tab |
---|
12 | 0C | 00001100 | FF | Form Feed |
---|
13 | 0D | 00001101 | CR | Carriage Return |
---|
14 | 0E | 00001110 | SO | Shift Out |
---|
15 | 0F | 00001111 | SI | Shift In |
---|
16 | 10 | 00010000 | DLE | Data Link Escape |
---|
17 | 11 | 00010001 | DC1 | Device Control 1 (oft. XON) |
---|
18 | 12 | 00010010 | DC2 | Device Control 2 |
---|
19 | 13 | 00010011 | DC3 | Device Control 3 (oft. XOFF) |
---|
20 | 14 | 00010100 | DC4 | Device Control 4 |
---|
21 | 15 | 00010101 | NAK | Negative Acknowledge |
---|
22 | 16 | 00010110 | SYN | Synchronous Idle |
---|
23 | 17 | 00010111 | ETB | End of Transmission Block |
---|
24 | 18 | 00011000 | CAN | Cancel |
---|
25 | 19 | 00011001 | EM | End of Medium |
---|
26 | 1A | 00011010 | SUB | Substitute |
---|
27 | 1B | 00011011 | ESC | Escape |
---|
28 | 1C | 00011100 | FS | File Separator |
---|
29 | 1D | 00011101 | GS | Group Separator |
---|
30 | 1E | 00011110 | RS | Record Separator |
---|
31 | 1F | 00011111 | US | Unit Separator |
---|
32 | 20 | 00100000 | (space) | Space |
---|
33 | 21 | 00100001 | ! | Exclamation Mark |
---|
34 | 22 | 00100010 | " | Quotation Mark |
---|
35 | 23 | 00100011 | # | Number Sign |
---|
36 | 24 | 00100100 | $ | Dollar Sign |
---|
37 | 25 | 00100101 | % | Percent Sign |
---|
38 | 26 | 00100110 | & | Ampersand |
---|
39 | 27 | 00100111 | ' | Apostrophe (Single Quote) |
---|
40 | 28 | 00101000 | ( | Left Parenthesis |
---|
41 | 29 | 00101001 | ) | Right Parenthesis |
---|
42 | 2A | 00101010 | * | Asterisk |
---|
43 | 2B | 00101011 | + | Plus Sign |
---|
44 | 2C | 00101100 | , | Comma |
---|
45 | 2D | 00101101 | - | Hyphen (Minus Sign) |
---|
46 | 2E | 00101110 | . | Period (Full Stop) |
---|
47 | 2F | 00101111 | / | Solidus (Slash) |
---|
48 | 30 | 00110000 | 0 | Digit Zero |
---|
49 | 31 | 00110001 | 1 | Digit One |
---|
50 | 32 | 00110010 | 2 | Digit Two |
---|
51 | 33 | 00110011 | 3 | Digit Three |
---|
52 | 34 | 00110100 | 4 | Digit Four |
---|
53 | 35 | 00110101 | 5 | Digit Five |
---|
54 | 36 | 00110110 | 6 | Digit Six |
---|
55 | 37 | 00110111 | 7 | Digit Seven |
---|
56 | 38 | 00111000 | 8 | Digit Eight |
---|
57 | 39 | 00111001 | 9 | Digit Nine |
---|
58 | 3A | 00111010 | : | Colon |
---|
59 | 3B | 00111011 | ; | Semicolon |
---|
60 | 3C | 00111100 | < | Less Than (Angle Bracket, Left Pointing) |
---|
61 | 3D | 00111101 | = | Equals Sign |
---|
62 | 3E | 00111110 | > | Greater Than (Angle Bracket, Right Pointing) |
---|
63 | 3F | 00111111 | ? | Question Mark |
---|
64 | 40 | 01000000 | @ | At Sign |
---|
65 | 41 | 01000001 | A | Uppercase A |
---|
66 | 42 | 01000010 | B | Uppercase B |
---|
67 | 43 | 01000011 | C | Uppercase C |
---|
68 | 44 | 01000100 | D | Uppercase D |
---|
69 | 45 | 01000101 | E | Uppercase E |
---|
70 | 46 | 01000110 | F | Uppercase F |
---|
71 | 47 | 01000111 | G | Uppercase G |
---|
72 | 48 | 01001000 | H | Uppercase H |
---|
73 | 49 | 01001001 | I | Uppercase I |
---|
74 | 4A | 01001010 | J | Uppercase J |
---|
75 | 4B | 01001011 | K | Uppercase K |
---|
76 | 4C | 01001100 | L | Uppercase L |
---|
77 | 4D | 01001101 | M | Uppercase M |
---|
78 | 4E | 01001110 | N | Uppercase N |
---|
79 | 4F | 01001111 | O | Uppercase O |
---|
80 | 50 | 01010000 | P | Uppercase P |
---|
81 | 51 | 01010001 | Q | Uppercase Q |
---|
82 | 52 | 01010010 | R | Uppercase R |
---|
83 | 53 | 01010011 | S | Uppercase S |
---|
84 | 54 | 01010100 | T | Uppercase T |
---|
85 | 55 | 01010101 | U | Uppercase U |
---|
86 | 56 | 01010110 | V | Uppercase V |
---|
87 | 57 | 01010111 | W | Uppercase W |
---|
88 | 58 | 01011000 | X | Uppercase X |
---|
89 | 59 | 01011001 | Y | Uppercase Y |
---|
90 | 5A | 01011010 | Z | Uppercase Z |
---|
91 | 5B | 01011011 | [ | Left Square Bracket |
---|
92 | 5C | 01011100 | \ | Backslash |
---|
93 | 5D | 01011101 | ] | Right Square Bracket |
---|
94 | 5E | 01011110 | ^ | Caret (Circumflex Accent) |
---|
95 | 5F | 01011111 | _ | Underscore |
---|
96 | 60 | 01100000 | ` | Grave Accent |
---|
97 | 61 | 01100001 | a | Lowercase a |
---|
98 | 62 | 01100010 | b | Lowercase b |
---|
99 | 63 | 01100011 | c | Lowercase c |
---|
100 | 64 | 01100100 | d | Lowercase d |
---|
101 | 65 | 01100101 | e | Lowercase e |
---|
102 | 66 | 01100110 | f | Lowercase f |
---|
103 | 67 | 01100111 | g | Lowercase g |
---|
104 | 68 | 01101000 | h | Lowercase h |
---|
105 | 69 | 01101001 | i | Lowercase i |
---|
106 | 6A | 01101010 | j | Lowercase j |
---|
107 | 6B | 01101011 | k | Lowercase k |
---|
108 | 6C | 01101100 | l | Lowercase l |
---|
109 | 6D | 01101101 | m | Lowercase m |
---|
110 | 6E | 01101110 | n | Lowercase n |
---|
111 | 6F | 01101111 | o | Lowercase o |
---|
112 | 70 | 01110000 | p | Lowercase p |
---|
113 | 71 | 01110001 | q | Lowercase q |
---|
114 | 72 | 01110010 | r | Lowercase r |
---|
115 | 73 | 01110011 | s | Lowercase s |
---|
116 | 74 | 01110100 | t | Lowercase t |
---|
117 | 75 | 01110101 | u | Lowercase u |
---|
118 | 76 | 01110110 | v | Lowercase v |
---|
119 | 77 | 01110111 | w | Lowercase w |
---|
120 | 78 | 01111000 | x | Lowercase x |
---|
121 | 79 | 01111001 | y | Lowercase y |
---|
122 | 7A | 01111010 | z | Lowercase z |
---|
123 | 7B | 01111011 | { | Left Curly Brace |
---|
124 | 7C | 01111100 | | | Vertical Bar |
---|
125 | 7D | 01111101 | } | Right Curly Brace |
---|
126 | 7E | 01111110 | ~ | Tilde |
---|
127 | 7F | 01111111 | DEL | Delete |
---|
ASCII Representation
Binary Representation
ASCII characters are represented in binary, providing a machine-readable format that computers use for internal processing.
Binary | Character | Description |
---|
00000000 | NUL | Null |
---|
00000001 | SOH | Start of Header |
---|
00000010 | STX | Start of Text |
---|
00000011 | ETX | End of Text |
---|
00000100 | EOT | End of Transmit |
---|
00000101 | ENQ | Enquiry |
---|
00000110 | ACK | Acknowledge |
---|
00000111 | BEL | Bell |
---|
00001000 | BS | Backspace |
---|
00001001 | HT | Horizontal Tab |
---|
00001010 | LF | Line Feed |
---|
00001011 | VT | Vertical Tab |
---|
00001100 | FF | Form Feed |
---|
00001101 | CR | Carriage Return |
---|
00001110 | SO | Shift Out |
---|
00001111 | SI | Shift In |
---|
... | ... | ... |
---|
00100000 | (space) | Space |
---|
00100001 | ! | Exclamation Mark |
---|
00100010 | " | Quotation Mark |
---|
... | ... | ... |
---|
01000001 | A | Uppercase A |
---|
01000010 | B | Uppercase B |
---|
... | ... | ... |
---|
01100001 | a | Lowercase a |
---|
01100010 | b | Lowercase b |
---|
... | ... | ... |
---|
01111111 | DEL | Delete |
---|
Decimal Representation
In decimal form, ASCII codes offer a human-readable representation, simplifying discussions and documentation.
Decimal | Character | Description |
---|
0 | NUL | Null |
---|
1 | SOH | Start of Header |
---|
2 | STX | Start of Text |
---|
3 | ETX | End of Text |
---|
4 | EOT | End of Transmit |
---|
5 | ENQ | Enquiry |
---|
6 | ACK | Acknowledge |
---|
7 | BEL | Bell |
---|
8 | BS | Backspace |
---|
9 | HT | Horizontal Tab |
---|
10 | LF | Line Feed |
---|
11 | VT | Vertical Tab |
---|
12 | FF | Form Feed |
---|
13 | CR | Carriage Return |
---|
14 | SO | Shift Out |
---|
15 | SI | Shift In |
---|
... | ... | ... |
---|
32 | (space) | Space |
---|
33 | ! | Exclamation Mark |
---|
34 | " | Quotation Mark |
---|
... | ... | ... |
---|
65 | A | Uppercase A |
---|
66 | B | Uppercase B |
---|
... | ... | ... |
---|
97 | a | Lowercase a |
---|
98 | b | Lowercase b |
---|
... | ... | ... |
---|
127 | DEL | Delete |
---|
Hexadecimal Representation
The hexadecimal representation of ASCII codes is commonly used in programming and digital design.
Hexadecimal | Character | Description |
---|
00 | NUL | Null |
---|
01 | SOH | Start of Header |
---|
02 | STX | Start of Text |
---|
03 | ETX | End of Text |
---|
04 | EOT | End of Transmit |
---|
05 | ENQ | Enquiry |
---|
06 | ACK | Acknowledge |
---|
07 | BEL | Bell |
---|
08 | BS | Backspace |
---|
09 | HT | Horizontal Tab |
---|
0A | LF | Line Feed |
---|
0B | VT | Vertical Tab |
---|
0C | FF | Form Feed |
---|
0D | CR | Carriage Return |
---|
0E | SO | Shift Out |
---|
0F | SI | Shift In |
---|
... | ... | ... |
---|
20 | (space) | Space |
---|
21 | ! | Exclamation Mark |
---|
22 | " | Quotation Mark |
---|
... | ... | ... |
---|
41 | A | Uppercase A |
---|
42 | B | Uppercase B |
---|
... | ... | ... |
---|
61 | a | Lowercase a |
---|
62 | b | Lowercase b |
---|
... | ... | ... |
---|
7F | DEL | Delete |
---|
ASCII in Computing
ASCII in Programming Languages
Programming languages extensively use ASCII for representing characters and symbols in source code.
ASCII in Data Transmission
ASCII is fundamental in data transmission protocols, ensuring compatibility and readability when exchanging information between systems.
ASCII Art and Design
Artistic expressions, known as ASCII art, leverage ASCII characters to create visual designs and graphics.
ASCII Extended Sets
- ASCII-8: ASCII-8 extends the character set, accommodating additional symbols and characters.
- ASCII-16: In ASCII-16, further characters are added, expanding the encoding possibilities.
- ASCII-32: ASCII-32 continues the extension, providing even more characters for diverse applications.
- ASCII-64: With ASCII-64, the character set grows, supporting an array of symbols and international characters.
- ASCII-128: The extended set ASCII-128 completes the 256-character spectrum, including a wide range of symbols.
ASCII vs. Unicode
Key Differences
ASCII and Unicode are both character encoding standards, but they have key differences in terms of scope and functionality. Let's compare ASCII and Unicode in a tabular format:
Feature | ASCII | Unicode |
---|
Definition | ASCII (American Standard Code for Information Interchange) is a character encoding standard that uses 7 or 8 bits to represent characters, mainly limited to the English alphabet, numerals, and a few special characters. | Unicode is a character encoding standard that aims to provide a unique code point for every character, regardless of platform, program, or language. It uses a variable number of bits (8, 16, or 32) to represent characters. |
Scope | Originally designed for English and a few other Western languages. | Designed to be a universal character encoding standard that supports a vast range of languages, symbols, and characters from various writing systems. |
Bit Usage | Typically uses 7 bits (extended ASCII uses 8 bits). | Can use 8, 16, or 32 bits per character, allowing it to represent a much larger number of characters. |
Number of Characters | Limited to 128 (with 7 bits) or 256 (with 8 bits). | Can represent over a million unique characters. |
Multilingual Support | Primarily supports English and a few Western languages. | Comprehensive support for almost all languages, including scripts like Cyrillic, Arabic, Chinese, Japanese, and many others. |
Backward Compatibility | Limited, as it was primarily designed for English and does not have built-in support for characters from various languages. | Maintains backward compatibility with ASCII. The first 128 Unicode code points correspond to ASCII, ensuring compatibility with existing ASCII data. |
Representation | Uses one byte (8 bits) per character. | Variable-length encoding, using 8, 16, or 32 bits per character. |
Standard Organization | Developed by ANSI (American National Standards Institute). | Developed by the Unicode Consortium, a non-profit organization that maintains and develops the Unicode standard. |
ASCII and Unicode differ in scope, with ASCII representing 128 characters and Unicode accommodating a vast array of characters from various scripts.
When to Use ASCII vs. Unicode
While ASCII is suitable for English and basic character encoding, Unicode is preferred for multilingual and diverse character requirements.
Practical Examples of ASCII
Demonstrations on converting characters to their ASCII equivalents for practical applications.
ASCII in File Handling
ASCII, as a character encoding standard, plays a significant role in file handling. When working with text files, understanding how ASCII characters are encoded and decoded is essential. Here's how ASCII is involved in file handling:
- Character Representation:
- ASCII represents characters using numeric codes. Each character is assigned a decimal value between 0 and 127, and this value is used to represent the character in binary form.
- Text File Encoding:
- Text files are often encoded using ASCII or its extended forms. The encoding determines how characters are represented in the file. ASCII encoding is a common choice for plain text files, especially when dealing with English text.
- Binary Files:
- While ASCII is commonly associated with text files, binary files can also use ASCII characters for metadata or textual information within the file. For example, file headers or configuration data may be encoded using ASCII.
- File Reading and Writing:
- When reading from or writing to text files using programming languages, developers need to specify the character encoding. ASCII encoding (or its extensions like UTF-8) is chosen based on the nature of the data being handled.
# Example in Python using UTF-8 encoding
with open('example.txt', 'r', encoding='utf-8') as file:
content = file.read()
- Line Endings:
- ASCII includes control characters for line feed (
LF
or \n
) and carriage return (CR
or \r
). The choice of line endings (Unix/Linux using LF
, Windows using CRLF
) affects how text files are handled on different operating systems.
- File Transfer Protocols:
- ASCII characters are often used in file transfer protocols, especially in FTP (File Transfer Protocol). When transferring text files, the client and server may negotiate to use ASCII mode to ensure correct line ending conversions.
- Programming Language Support:
- Many programming languages provide built-in functions for reading and writing files. These functions often allow developers to specify the character encoding, and ASCII encoding can be chosen when dealing with simple text files.
- Code Files:
- Source code files for programming languages are often encoded using ASCII or UTF-8, which is backward-compatible with ASCII. This ensures that the code can be read and interpreted correctly by various compilers and interpreters.
- Metadata and Headers:
- ASCII characters are commonly used in file metadata, headers, or configuration files where human-readable text is needed. For example, XML or JSON files may use ASCII for the textual representation of data.
- Error Handling:
- When handling files, it's essential to consider error handling for cases where the file contains unexpected characters or encoding issues. Proper error handling can prevent data corruption and ensure the robustness of the application.
ASCII in URL Encoding
URL encoding, also known as percent-encoding, is a method used to represent certain characters in a URL by replacing them with a percent sign (%) followed by two hexadecimal digits. While URL encoding can encompass a broader range of characters, ASCII characters play a significant role in this process. Here's how ASCII is involved in URL encoding:
- Character Representation:
- ASCII characters are a subset of the characters that can be directly used in a URL without encoding. These include alphanumeric characters (A-Z, a-z, 0-9) and a set of special characters (such as hyphen, underscore, period, and tilde).
- Reserved Characters:
- Certain ASCII characters have special meanings in a URL and are reserved for specific purposes. For example:
- Reserved Characters: ! * ' ( ) ; : @ & = + $ , / ? % # [ ] -
- Unreserved Characters: Alphanumeric characters (A-Z, a-z, 0-9), hyphen, underscore, period, and tilde.
- Encoding Reserved Characters:
- When a reserved character needs to be included in a URL, it must be URL-encoded. For instance, space is represented as
%20
, and the exclamation mark (!) is represented as %21
. This prevents misinterpretation of these characters by the URL parser.
Original: Hello World!
URL Encoded: Hello%20World%21
- Percent Encoding:
- Percent encoding involves representing non-alphanumeric characters using the percent sign (%) followed by two hexadecimal digits. This ensures that these characters are correctly interpreted in a URL.
Original: /path/to/file with spaces.txt
URL Encoded: /path/to/file%20with%20spaces.txt
- ASCII Control Characters:
- ASCII control characters and non-printable characters, which are not allowed in URLs, are often excluded. However, if they need to be included, they are represented using percent encoding.
Original: Line1\nLine2
URL Encoded: Line1%0ALine2
- Programming Language Support:
- When working with URLs in programming, libraries and functions for URL encoding are often provided. These functions take care of encoding reserved characters and ensuring that the resulting URL is valid.
# Example in Python
import urllib.parse
url = "https://example.com/path with spaces"
encoded_url = urllib.parse.quote(url)
print(encoded_url)
- Query Parameters:
- In URLs, query parameters are separated by the ampersand (&) symbol. When the parameter values contain reserved or non-alphanumeric characters, these characters are URL-encoded.
Original: ?name=John Doe&age=30
URL Encoded: ?name=John%20Doe&age=30
ASCII in Networking
- ASCII in Protocols (HTTP, FTP, etc.): The integral role of ASCII in networking protocols like HTTP and FTP, ensuring standardized communication.
- ASCII in Email Communication: ASCII's role in email systems, influencing the way messages are transmitted and displayed.
- ASCII in Security
- ASCII in Passwords: Exploration of ASCII's role in password representation and security considerations.
- ASCII in Encryption: Understanding how ASCII encoding principles align with encryption algorithms for secure data transmission.
Limitations of ASCII
ASCII, while widely used and simple, has some limitations, especially in the context of modern computing needs. Here are some of the key limitations of ASCII:
- Limited Character Set: ASCII is limited to representing only 128 characters (7-bit encoding) or 256 characters (8-bit encoding). This limitation is restrictive when dealing with languages and writing systems beyond the basic Latin alphabet.
- No Support for Non-Latin Characters: ASCII does not provide support for characters outside the English alphabet, such as accented characters in European languages, characters from Asian languages, or special symbols used in various writing systems.
- Lack of Standardization for Extended ASCII: While ASCII itself only uses 7 bits, the extended ASCII set (8-bit encoding) is not standardized across different systems. Different extended ASCII encodings have been developed, leading to compatibility issues.
- No Representation for Control Characters Beyond 127: ASCII control characters with decimal values greater than 127 have specific functions (e.g., extended Latin characters), but they are not standardized. Their interpretation can vary among different systems.
- Not Well-Suited for Multilingual Text: As a character encoding standard, ASCII is not designed to handle the diverse needs of multilingual text representation. Modern applications often require support for a wide range of languages, which ASCII cannot accommodate adequately.
- Limited Symbolic Representation: ASCII lacks representation for certain symbols and mathematical characters commonly used in scientific and technical contexts. This limitation hinders its suitability for applications requiring these symbols.
- Fixed-Length Encoding: ASCII uses a fixed-length encoding of 7 or 8 bits per character. While this simplicity was an advantage in early computing, it is less efficient than variable-length encodings like UTF-8 used by Unicode. Variable-length encoding allows more efficient storage of characters.
- No Provision for Metadata or Formatting: ASCII is primarily focused on character representation and lacks provisions for metadata, formatting information, or characters with specialized functions in modern text processing.
- Globalization Challenges: As a result of its limitations, ASCII poses challenges when developing applications for a global audience with diverse linguistic and cultural requirements.
Handling Non-ASCII Characters
Handling non-ASCII characters is crucial when dealing with text data that goes beyond the basic Latin alphabet covered by ASCII. Here are some common approaches and considerations for handling non-ASCII characters:
- Unicode Encoding:
- UTF-8, UTF-16, UTF-32: Unicode is a character encoding standard that supports a vast range of characters from different languages and writing systems. UTF-8, UTF-16, and UTF-32 are different encoding schemes under the Unicode standard, allowing representation of characters using 8, 16, or 32 bits per character, respectively.
- Use Unicode-Compatible Data Types:
- When working with programming languages or databases, ensure that you use data types that support Unicode characters. For example, in many programming languages, using
string
or char
data types that support Unicode is essential.
- Normalization:
- Unicode Normalization is the process of transforming text into a standardized form, ensuring that equivalent sequences of characters are represented in a consistent way. This is important when dealing with characters that can be represented in multiple ways, such as accented characters.
- Libraries and Frameworks:
- Many programming languages provide libraries and frameworks that handle Unicode and non-ASCII characters seamlessly. Utilize these libraries to ensure correct processing of text data.
- File Encodings:
- When working with text files, be aware of the encoding used. UTF-8 is a common and widely supported encoding for handling Unicode characters. Make sure that the applications reading and writing files support the chosen encoding.
- Database Collation:
- Database collation settings determine how string comparison operations are performed. Choose a collation that supports the language and characters you are working with. Unicode collations are designed to handle a wide range of characters.
- Web Page Character Encoding:
- Specify the character encoding in the
<meta>
tag of HTML documents to ensure that web browsers interpret and display non-ASCII characters correctly.
- Regular Expressions:
- When using regular expressions, ensure that the patterns are Unicode-aware. Many programming languages provide Unicode-aware regular expression functions.
- Input and Output Handling:
- When dealing with user input or displaying information to users, ensure that input forms, databases, and web pages are configured to handle non-ASCII characters. Validate and sanitize user input to prevent issues.
- Testing and Internationalization:
- Conduct thorough testing, especially if your application is intended for a global audience. Consider internationalization (i18n) best practices to make your software adaptable to various languages and regions.
By embracing Unicode and adopting best practices for handling non-ASCII characters, you can ensure that your applications are capable of supporting a wide range of languages and writing systems. This is particularly important in today's globalized and interconnected world.
Similar Reads
Computer Organization and Architecture Tutorial Computer architecture defines how a computerâs components communicate through electronic signals to perform input, processing, and output operations.It covers the design and organization of the CPU, memory, storage, and input/output devices.Describes how these components interact through buses, cont
4 min read
Basic Computer Instructions
What is a Computer?A computer is an electronic device that processes data according to instructions provided by software programs. It takes input (data), processes it using a central processing unit (CPU), stores information, and produces output (results) to perform various tasks.Types of ComputersThere are various ty
8 min read
Issues in Computer DesignComputer Design is the structure in which components relate to each other. The designer deals with a particular level of system at a time and there are different types of issues at different levels. At each level, the designer is concerned with the structure and function. The structure is the skelet
3 min read
Difference between assembly language and high level languageProgramming Language is categorized into assembly language and high-level language. Assembly-level language is a low-level language that is understandable by machines whereas High-level language is human-understandable language. What is Assembly Language?It is a low-level language that allows users
2 min read
Addressing ModesAddressing modes are the techniques used by the CPU to identify where the data needed for an operation is stored. They provide rules for interpreting or modifying the address field in an instruction before accessing the operand.Addressing modes for 8086 instructions are divided into two categories:
7 min read
Difference between Memory based and Register based Addressing ModesPrerequisite - Addressing Modes Addressing modes are the operations field specifies the operations which need to be performed. The operation must be executed on some data which is already stored in computer registers or in the memory. The way of choosing operands during program execution is dependen
4 min read
Computer Organization - Von Neumann ArchitectureComputer Organization is like understanding the "blueprint" of how a computer works internally. One of the most important models in this field is the Von Neumann architecture, which is the foundation of most modern computers. Named after John von Neumann, this architecture introduced the concept of
6 min read
Harvard ArchitectureIn a normal computer that follows von Neumann architecture, instructions, and data both are stored in the same memory. So same buses are used to fetch instructions and data. This means the CPU cannot do both things together (read the instruction and read/write data). So, to overcome this problem, Ha
5 min read
Interaction of a Program with HardwareWhen a Programmer writes a program, it is fed into the computer and how does it actually work? So, this article is about the process of how the program code that is written on any text editor is fed to the computer and gets executed. As we all know computers work with only two numbers,i.e. 0s or 1s.
3 min read
Simplified Instructional Computer (SIC)Simplified Instructional Computer (SIC) is a hypothetical computer that has hardware features that are often found in real machines. There are two versions of this machine: SIC standard ModelSIC/XE(extra equipment or expensive)Object programs for SIC can be properly executed on SIC/XE which is known
4 min read
Instruction Set used in simplified instructional Computer (SIC)Prerequisite - Simplified Instructional Computer (SIC) These are the instructions used in programming the Simplified Instructional Computer(SIC). Here, A stands for Accumulator M stands for Memory CC stands for Condition Code PC stands for Program Counter RMB stands for Right Most Byte L stands for
1 min read
Instruction Set used in SIC/XEPre-Requisite: SIC/XE Architecture SIC/XE (Simplified Instructional Computer Extra Equipment or Extra Expensive). SIC/XE is an advanced version of SIC. Both SIC and SIC/XE are closely related to each other thatâs why they are Upward Compatible. Below mentioned are the instructions that are used in S
2 min read
RISC and CISC in Computer OrganizationRISC is the way to make hardware simpler whereas CISC is the single instruction that handles multiple work. In this article, we are going to discuss RISC and CISC in detail as well as the Difference between RISC and CISC, Let's proceed with RISC first. Reduced Instruction Set Architecture (RISC) The
5 min read
Vector processor classificationVector processors have rightfully come into prominence when it comes to designing computing architecture by virtue of how they handle large datasets efficiently. A large portion of this efficiency is due to the retrieval from architectural configurations used in the implementation. Vector processors
5 min read
Essential Registers for Instruction ExecutionRegisters are small, fast storage locations directly inside the processor, used to hold data, addresses, and control information during instruction processing. They play an important role in instruction execution within a CPU. Following are various registers required for the execution of instruction
3 min read
Introduction of Single Accumulator based CPU organizationThe computers, present in the early days of computer history, had accumulator-based CPUs. In this type of CPU organization, the accumulator register is used implicitly for processing all instructions of a program and storing the results into the accumulator. The instruction format that is used by th
2 min read
Stack based CPU OrganizationBased on the number of address fields, CPU organization is of three types: Single Accumulator organization, register based organization and stack based CPU organization.Stack-Based CPU OrganizationThe computers which use Stack-based CPU Organization are based on a data structure called a stack. The
4 min read
Machine Control Instructions in MicroprocessorMicroprocessors are electronic devices that process digital information using instructions stored in memory. Machine control instructions are a type of instruction that control machine functions such as Halt, Interrupt, or do nothing. These instructions alter the different type of operations execute
4 min read
Very Long Instruction Word (VLIW) ArchitectureThe limitations of the Superscalar processor are prominent as the difficulty of scheduling instruction becomes complex. The intrinsic parallelism in the instruction stream, complexity, cost, and the branch instruction issue get resolved by a higher instruction set architecture called the Very Long I
4 min read
Input and Output Systems
Computer Organization | Different Instruction CyclesIntroduction : Prerequisite - Execution, Stages and Throughput Registers Involved In Each Instruction Cycle: Memory address registers(MAR) : It is connected to the address lines of the system bus. It specifies the address in memory for a read or write operation.Memory Buffer Register(MBR) : It is co
11 min read
Machine InstructionsMachine Instructions are commands or programs written in the machine code of a machine (computer) that it can recognize and execute. A machine instruction consists of several bytes in memory that tell the processor to perform one machine operation. The processor looks at machine instructions in main
5 min read
Computer Organization | Instruction Formats (Zero, One, Two and Three Address Instruction)Instruction formats refer to the way instructions are encoded and represented in machine language. There are several types of instruction formats, including zero, one, two, and three-address instructions. Each type of instruction format has its own advantages and disadvantages in terms of code size,
11 min read
Difference between 2-address instruction and 1-address instructionsWhen we convert a High-level language into a low-level language so that a computer can understand the program we require a compiler. The compiler converts programming statements into binary instructions. Instructions are nothing but a group of bits that instruct the computer to perform some operatio
4 min read
Difference between 3-address instruction and 0-address instructionAccording to how many addresses an instruction consumes for arguments, instructions can be grouped. Two numerous kinds of instructions are 3 address and 0 address instructions. It is crucial to comprehend the distinction between these two, in order to know how different processors function in relati
4 min read
Register content and Flag status after InstructionsBasically, you are given a set of instructions and the initial content of the registers and flags of 8085 microprocessor. You have to find the content of the registers and flag status after each instruction. Initially, Below is the set of the instructions: SUB A MOV B, A DCR B INR B SUI 01H HLT Assu
3 min read
Debugging a machine level programDebugging is the process of identifying and removing bug from software or program. It refers to identification of errors in the program logic, machine codes, and execution. It gives step by step information about the execution of code to identify the fault in the program. Debugging of machine code:
3 min read
Vector Instruction Format in Vector ProcessorsINTRODUCTION: Vector instruction format is a type of instruction format used in vector processors, which are specialized types of microprocessors that are designed to perform vector operations efficiently. In a vector processor, a single instruction can operate on multiple data elements in parallel,
7 min read
Vector Instruction TypesAn ordered collection of elements â the length of which is determined by the number of elementsâis referred to as a vector operand in computer architecture and programming. A vector contains just one kind of element per element, whether it is an integer, logical value, floating-point number, or char
4 min read
Instruction Design and Format
Introduction of ALU and Data PathRepresenting and storing numbers were the basic operations of the computers of earlier times. The real go came when computation, manipulating numbers like adding and multiplying came into the picture. These operations are handled by the computer's arithmetic logic unit (ALU). The ALU is the mathemat
8 min read
Computer Arithmetic | Set - 1Negative Number Representation Sign Magnitude Sign magnitude is a very simple representation of negative numbers. In sign magnitude the first bit is dedicated to represent the sign and hence it is called sign bit. Sign bit â1â represents negative sign. Sign bit â0â represents positive sign. In sign
5 min read
Computer Arithmetic | Set - 2FLOATING POINT ADDITION AND SUBTRACTION FLOATING POINT ADDITION To understand floating point addition, first we see addition of real numbers in decimal as same logic is applied in both cases. For example, we have to add 1.1 * 103 and 50. We cannot add these numbers directly. First, we need to align
4 min read
Difference Between 1's Complement Representation and 2's Complement Representation TechniqueIn computer science, binary number representations like 1's complement and 2's complement are essential for performing arithmetic operations and encoding negative numbers in digital systems. Understanding the differences between these two techniques is crucial for knowing how computers handle signed
5 min read
Restoring Division Algorithm For Unsigned IntegerThe Restoring Division Algorithm is an integral procedure employed when calculating division on unsigned numbers. It is particularly beneficial in the digital computing application whereby base-two arithmetic is discrete. As a distinct from other algorithms, the Restoring Division Algorithm divides
5 min read
Non-Restoring Division For Unsigned IntegerThe non-restoring division is a division technique for unsigned binary values that simplifies the procedure by eliminating the restoring phase. The non-restoring division is simpler and more effective than restoring division since it just employs addition and subtraction operations instead of restor
4 min read
Computer Organization | Booth's AlgorithmBooth algorithm gives a procedure for multiplying binary integers in signed 2âs complement representation in efficient way, i.e., less number of additions/subtractions required. It operates on the fact that strings of 0âs in the multiplier require no addition but just shifting and a string of 1âs in
7 min read
How the negative numbers are stored in memory?Prerequisite - Base conversions, 1âs and 2âs complement of a binary number, 2âs complement of a binary string Suppose the following fragment of code, int a = -34; Now how will this be stored in memory. So here is the complete theory. Whenever a number with minus sign is encountered, the number (igno
2 min read
Microprogrammed Control
Computer Organization | Micro-OperationIn computer organization, a micro-operation refers to the smallest tasks performed by the CPU's control unit. These micro-operations helps to execute complex instructions. They involve simple tasks like moving data between registers, performing arithmetic calculations, or executing logic operations.
3 min read
Microarchitecture and Instruction Set ArchitectureIn this article, we look at what an Instruction Set Architecture (ISA) is and what is the difference between an 'ISA' and Microarchitecture. An ISA is defined as the design of a computer from the Programmer's Perspective. This basically means that an ISA describes the design of a Computer in terms o
5 min read
Types of Program Control InstructionsIn microprocessor and Microcontroller ,program control instructions guide how a computer executes a program by allowing changes in the normal flow of operations. These instructions help in making decisions, repeating tasks, or stopping the program.What is Program Control Instructions ?Program Contro
6 min read
Difference between CALL and JUMP instructionsIn assembly language as well as in low level programming CALL and JUMP are the two major control transfer instructions. Both instructions enable a program to go to different other parts of the code but both are different. CALL is mostly used to direct calls to subroutine or a function and regresses
5 min read
Computer Organization | Hardwired v/s Micro-programmed Control UnitIntroduction :In computer architecture, the control unit is responsible for directing the flow of data and instructions within the CPU. There are two main approaches to implementing a control unit: hardwired and micro-programmed.A hardwired control unit is a control unit that uses a fixed set of log
5 min read
Implementation of Micro Instructions SequencerThe address is used by a microprogram sequencer to decide which microinstruction has to be performed next. Microprogram sequencing is the name of the total procedure. The addresses needed to step through a control store's microprogram are created by a sequencer, also known as a microsequencer. The c
4 min read
Performance of Computer in Computer OrganizationIn computer organization, performance refers to the speed and efficiency at which a computer system can execute tasks and process data. A high-performing computer system is one that can perform tasks quickly and efficiently while minimizing the amount of time and resources required to complete these
5 min read
Introduction of Control Unit and its DesignA Central Processing Unit is the most important component of a computer system. A control unit is a part of the CPU. A control unit controls the operations of all parts of the computer but it does not carry out any data processing operations. What is a Control Unit?The Control Unit is the part of th
10 min read
Computer Organization | Amdahl's law and its proofIt is named after computer scientist Gene Amdahl( a computer architect from IBM and Amdahl corporation) and was presented at the AFIPS Spring Joint Computer Conference in 1967. It is also known as Amdahl's argument. It is a formula that gives the theoretical speedup in latency of the execution of a
6 min read
Subroutine, Subroutine nesting and Stack memoryIn computer programming, Instructions that are frequently used in the program are termed Subroutines. This article will provide a detailed discussion on Subroutines, Subroutine Nesting, and Stack Memory. Additionally, we will explore the advantages and disadvantages of these topics. Let's begin with
5 min read
Different Types of RAM (Random Access Memory )In the computer world, memory plays an important component in determining the performance and efficiency of a system. In between various types of memory, Random Access Memory (RAM) stands out as a necessary component that enables computers to process and store data temporarily. In this article, we w
8 min read
Random Access Memory (RAM) and Read Only Memory (ROM)Memory is a fundamental component of computing systems, essential for performing various tasks efficiently. It plays a crucial role in how computers operate, influencing speed, performance, and data management. In the realm of computer memory, two primary types stand out: Random Access Memory (RAM)
8 min read
2D and 2.5D Memory organizationThe internal structure of Memory either RAM or ROM is made up of memory cells that contain a memory bit. A group of 8 bits makes a byte. The memory is in the form of a multidimensional array of rows and columns. In which, each cell stores a bit and a complete row contains a word. A memory simply can
4 min read
Input and Output Organization
Priority Interrupts | (S/W Polling and Daisy Chaining)In I/O Interface (Interrupt and DMA Mode), we have discussed the concept behind the Interrupt-initiated I/O. To summarize, when I/O devices are ready for I/O transfer, they generate an interrupt request signal to the computer. The CPU receives this signal, suspends the current instructions it is exe
5 min read
I/O Interface (Interrupt and DMA Mode)The method that is used to transfer information between internal storage and external I/O devices is known as I/O interface. The CPU is interfaced using special communication links by the peripherals connected to any computer system. These communication links are used to resolve the differences betw
6 min read
Direct memory access with DMA controller 8257/8237Suppose any device which is connected to input-output port wants to transfer data to memory, first of all it will send input-output port address and control signal, input-output read to input-output port, then it will send memory address and memory write signal to memory where data has to be transfe
3 min read
Computer Organization | Asynchronous input output synchronizationIntroduction : Asynchronous input/output (I/O) synchronization is a technique used in computer organization to manage the transfer of data between the central processing unit (CPU) and external devices. In asynchronous I/O synchronization, data transfer occurs at an unpredictable rate, with no fixed
7 min read
Programmable peripheral interface 8255PPI 8255 is a general purpose programmable I/O device designed to interface the CPU with its outside world such as ADC, DAC, keyboard etc. We can program it according to the given condition. It can be used with almost any microprocessor. It consists of three 8-bit bidirectional I/O ports i.e. PORT A
4 min read
Synchronous Data Transfer in Computer OrganizationIn Synchronous Data Transfer, the sending and receiving units are enabled with the same clock signal. It is possible between two units when each of them knows the behaviour of the other. The master performs a sequence of instructions for data transfer in a predefined order. All these actions are syn
4 min read
Introduction of Input-Output ProcessorThe DMA mode of data transfer reduces the CPU's overhead when handling I/O operations. It also allows parallel processing between CPU and I/O operations. This parallelism is necessary to avoid the wastage of valuable CPU time when handling I/O devices whose speeds are much slower as compared to CPU.
5 min read
MPU Communication in Computer OrganizationMPU communicates with the outside world with the help of some external devices which are known as Input/Output devices. The MPU accepts the binary data from input devices such as keyboard and analog/digital converters and sends data to output devices such as printers and LEDs. For performing this ta
4 min read
Memory Mapped I/O and Isolated I/OCPU needs to communicate with the various memory and input-output devices (I/O). Data between the processor and these devices flow with the help of the system bus. There are three ways in which system bus can be allotted to them:Separate set of address, control and data bus to I/O and memory.Have co
5 min read
Memory Organization
Introduction to memory and memory unitsMemory is required to save data and instructions. Memory is divided into cells, and they are stored in the storage space present in the computer. Every cell has its unique location/address. Memory is very essential for a computer as this is the way it becomes somewhat more similar to a human brain.
11 min read
Memory Hierarchy Design and its CharacteristicsIn the Computer System Design, Memory Hierarchy is an enhancement to organize the memory such that it can minimize the access time. The Memory Hierarchy was developed based on a program behavior known as locality of references (same data or nearby data is likely to be accessed again and again). The
6 min read
Register Allocations in Code GenerationRegisters are the fastest locations in the memory hierarchy. But unfortunately, this resource is limited. It comes under the most constrained resources of the target processor. Register allocation is an NP-complete problem. However, this problem can be reduced to graph coloring to achieve allocation
6 min read
Cache MemoryCache memory is a small, fast storage space within a computer. It holds duplicates of data from commonly accessed locations in the main memory. The CPU contains several separate caches that store both instructions and data.Cache Memory The key function of cache memory is to reduce the average time n
5 min read
Cache Organization | Set 1 (Introduction)Cache is close to CPU and faster than main memory. But at the same time is smaller than main memory. The cache organization is about mapping data in memory to a location in cache. A Simple Solution: One way to go about this mapping is to consider last few bits of long memory address to find small ca
3 min read
Multilevel Cache OrganisationCache is a type of random access memory (RAM) used by the CPU to reduce the average time required to access data from memory. Multilevel caches are one of the techniques used to improve cache performance by reducing the miss penalty. The miss penalty refers to the additional time needed to retrieve
6 min read
Difference between RAM and ROMMemory is an important part of the Computer which is responsible for storing data and information on a temporary or permanent basis. Memory can be classified into two broad categories: Primary Memory Secondary Memory What is Primary Memory? Primary Memory is a type of Computer Memory that the Prepro
7 min read
Difference Between CPU Cache and TLBThe CPU Cache and Translation Lookaside Buffer (TLB) are two important microprocessor hardware components that improve system performance, although they have distinct functions. Even though some people may refer to TLB as a kind of cache, it's important to recognize the different functions they serv
4 min read
Introduction to Solid-State Drive (SSD)A Solid-State Drive (SSD) is a non-volatile storage device that stores data without using any moving parts, unlike traditional Hard Disk Drives (HDDs), which have spinning disks and mechanical read/write heads. Because of this, SSDs are much faster, more durable, and quieter than HDDs. They load fil
7 min read
Read and Write operations in MemoryA memory unit stores binary information in groups of bits called words. Data input lines provide the information to be stored into the memory, Data output lines carry the information out from the memory. The control lines Read and write specifies the direction of transfer of data. Basically, in the
3 min read
Pipelining
Instruction Level ParallelismInstruction Level Parallelism (ILP) is used to refer to the architecture in which multiple operations can be performed parallelly in a particular process, with its own set of resources - address space, registers, identifiers, state, and program counters. It refers to the compiler design techniques a
5 min read
Computer Organization and Architecture | Pipelining | Set 1 (Execution, Stages and Throughput)Pipelining is a technique used in modern processors to improve performance by executing multiple instructions simultaneously. It breaks down the execution of instructions into several stages, where each stage completes a part of the instruction. These stages can overlap, allowing the processor to wo
9 min read
Computer Organization and Architecture | Pipelining | Set 3 (Types and Stalling)Please see Set 1 for Execution, Stages and Performance (Throughput) and Set 2 for Dependencies and Data Hazard. Types of pipeline Uniform delay pipeline In this type of pipeline, all the stages will take same time to complete an operation. In uniform delay pipeline, Cycle Time (Tp) = Stage Delay If
3 min read
Computer Organization and Architecture | Pipelining | Set 2 (Dependencies and Data Hazard)Please see Set 1 for Execution, Stages and Performance (Throughput) and Set 3 for Types of Pipeline and Stalling. Dependencies in a pipelined processor There are mainly three types of dependencies possible in a pipelined processor. These are : 1) Structural Dependency 2) Control Dependency 3) Data D
6 min read
Last Minute Notes Computer Organization Table of ContentBasic TerminologyInstruction Set and Addressing ModesInstruction Design and FormatControl UnitMemory Organization I/O InterfacePipeliningIEEE Standard 754 Floating Point NumbersBasic TerminologyControl Unit - A control unit (CU) handles all processor control signals. It directs all i
15+ min read
COA GATE PYQ's AND COA Quiz