06 02 Emerging

The document discusses the challenges and advancements in multilingual computing, particularly focusing on the development of Unicode as a solution for encoding diverse languages on the Internet. It highlights the limitations of traditional character sets and the need for a more inclusive system to support global languages. Additionally, it introduces innovative computing devices aimed at improving access to technology in underprivileged regions, emphasizing the importance of community-driven solutions for language preservation and education.

Uploaded by

maso725668

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views6 pages

06 02 Emerging

Uploaded by

maso725668

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Language Learning & Technology May 2002, Volume 6, Number 2

http://llt.msu.edu/vol6num2/emerging/ pp. 6-11

EMERGING TECHNOLOGIES
Multilingual Computing
Robert Godwin-Jones
Virginia Commonwealth University

Language teachers, unless they teach ESL, often bemoan the use of English as the lingua franca of the
Internet. The Web can be an invaluable source of authentic language use, but many Web sites world wide
bypass native tongues in favor of what has become the universal Internet language. The issue is not just
one of audience but also of the capability of computers to display and input text in a variety of languages.
It is a software and hardware problem, but also one of world-wide standards. In this column we will look
at current developments in multilingual computing -- in particular the rise of Unicode and the arrival of
alternative computing devices for world languages such as India’s "simputer."
Character Sets
A computer registers and records characters as a set of numbers in binary form. Historically, character
data is stored in 8 bit chunks (a bit is either a 1 or a 0) known as a byte. Personal computers, as they
evolved in the United States for English language speakers used a 7-bit character code known as ASCII
(American Standard Code of Information Interchange) with one bit reserved for error checking. The 7-bit
ASCII encoding encompasses 128 characters, the Latin alphabet (lower and upper case), numbers,
punctuation, some symbols. This was used as the basis for larger 8-bit character sets with 256 characters
(sometimes referred to as "extended ASCII") that include accented characters for West European
languages. ASCII has been around since 1963 and was extended by the ISO (International Organization
for Standardization) in 1967 to allow for use of character codes for non-Latin alphabet languages such as
Arabic and Greek. Later, to satisfy the need for use of languages such as Russian and Hebrew, the
standard called ISO 2022 was established, later expanded into ISO 8859-1 (often called "Latin-1") which
is widely used today for the interchange of information across the Web in Western languages. Actually,
Latin-1 is one of 10 character sets, all 8-bit, defined by ISO 8859; others target eastern European
languages, Turkish, Hebrew, Greek, Icelandic, and Celtic. The variety of ISO 8859 encodings is evident
in the multiple character encodings which can be set in contemporary Web browsers.
ASCII does a fine job for working in English, since that was what it was designed to do. Likewise, 8-bit
ISO 8859 character sets are adequate for displaying most of the world’s writing systems. But they are not
capable of dealing with languages with many more characters such as Japanese or Chinese. What is
needed for such languages is at minimum a 16 bit or two-byte system which can handle thousands of
characters. Sixteen bit encoding was not used initially on personal computers not just because of
monolingual shortsightedness but also for technical reasons -- early computers had very little memory and
storage capacity. With the current capacity of personal computers one might ask why not simply adapt a
3-byte or even a 4-byte character set system which would supply a virtually limitless number of
characters to be displayed, thus guaranteeing the encoding of any of the world’s languages. The problem
is that such encoding systems would tend to use many more resources than is necessary for the display of
most linguistic data, thereby slowing down network transmission and making display problematic on
smaller devices with less processing power and memory. Also, computer operating and networking
systems were designed to handle 8-bit data chunks; keeping 8-bit systems helps transactions progress
smoothly and avoids the necessity for universal system upgrades.