0% found this document useful (0 votes)
48 views10 pages

Power Point

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views10 pages

Power Point

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Unicode international character-encoding system

designed to support the electronic interchange,


processing, and display of the written texts of the diverse
languages of the modern and classical world. The Unicode
Standard includes letters, digits, diacritics, punctuation
marks, and technical symbols for all the world's principal
written languages, as well as emoji and other symbols,
using a uniform encoding scheme. The standard is
maintained by the Unicode Consortium. The first version
of Unicode was introduced in 1991; the most recent
version contains more than 1 00,000 characters.
Numerous encoding systems (including ASCII) predate
Unicode. With Unicode (unlike earlier systems), the
unique number provided for each character remains the
same on any system that supports Unicode. The
introduction of ASCII characters was not enough to cover
all the languages. Therefore, to overcome this situation, it
was introduced. The Unicode Consortium introduced this
encoding scheme.

#INTERNAL STORAGE ENCODING


OF CHARACTER
We know that a computer understands only
binary language (0 and 1). Moreover, it is not
able to directly understand or store any
alphabets, other numbers, pictures, symbols,
etc. Therefore, we use certain coding schemes
so that it can understand each of them
correctly. Besides, we call these codes
alphanumeric codes. This standard includes
roughly 100000 characters to represent
characters of different languages. While ASCII
uses only 1 byte the Unicode uses 4 bytes to
represent characters. Hence, it provides a very
wide variety of encoding. It has three types
namely UTF-8, UTF-1 6, UTF-32. Among them,
UTF-8 is used mostly it is also the default
encoding for many programming language
#UCS(UNIVERSAL CHARACTER
SET)

It is a very common acronym in the Unicode


scheme. It stands for Universal Character Set.
Furthermore, it is the encoding scheme for
storing the Unicode text.

UCS-2: It uses two bytes to store the characters.

UCS-4: It uses two bytes to store the characters.

#UCS(UNIVERSAL CHARACTER
SET)

The UT F is the most important part of this


encoding scheme. It stands for Unicode
Transformation Format. Moreover, this defines
how the code represents Unicode. It has 3 types
as follows:
UTF-7
This scheme is designed to represent the ASCII
standard. Since the ASCII uses 7 bits encoding. It
represents the ASCII characters in emails and
messages which use this standard.

UTF-8
It is the most commonly used form of encoding.
Furthermore, it has the capacity to use up to 4
bytes for representing the characters. It uses:

1byte to represent English letters and symbols.

2bytes to represent additional Latin and Middle


Eastern letters and symbols.

3bytes to represent Asian letters and symbols.

4bytes for other additional characters.

Moreover, it is compatible with the ASCII standard.


Its uses are as follows:

Many protocols use this scheme.

It is the default standard for XML files


Some file systems Unix and Linux use it in some
files.

INTERNAL PROCESSING OF SOME


APPLICATIONS.
It is widely used in web development today.

It can also represent emojis which is today a very


important feature of most apps.

UTF-1 6

It is an extension of UCS-2 encoding. Moreover, it


uses to represent the 65536 characters. Moreover,
it also supports 4 bytes for additional characters.
Furthermore, it is used for internal processing like
in java, Microsoft windows, etc.

UTF-32
It is a multibyte encoding scheme. Besides, it
uses 4 bytes to represent the characters.

↪️IMPORTANCE OF UNICODE

As it is a universal standard therefore, it


allows writing a single application for various
platforms. This means that we can develop an
application once and run it on various
platforms in different languages. Hence we
don't have to write the code for the same
application again and again. And therefore the
development cost reduces.

Moreover, data corruption is not possible in It is a


common encoding standard for many different
languages and characters.We can use it to
convert from one coding scheme to another.
Since Unicode is the superset for all encoding
schemes. Hence, we can convert a code into
Unicode and then convert it into another coding
standard.It is preferred by many coding
languages. For example,XML tools and
applications use this standard only.

***ADVANTAGES OF UNICODE
-It is a global standard for encoding.

-It has support for the mixed-script computer


environment.

-The encoding has space efficiency and hence,


saves memory.

-A common scheme for web development.

-Increases the data interoperability of code


on cross platforms.

-Saves time and development cost of applications.


Unicode is a 1 6-bit system which can support
many more characters than ASCII.

***Disadvantage
Because it has more characters, Unicode
uses a lot more space.It takes 2 bytes to
store each character. Unicode uses more
bytes to enumerate its vastly larger
range of alphabetic symbols
THE FIRST 128 CHARACTERS ARE THE
SAME AS THE ASCII SYSTEM MAKING IT
COMPATIBLE.

There are 6400 characters set aside for the user


or software.

There are still characters which have not been


defined yet, future-proofing the system.

It can store characters from more than one


language.
It can store characters from languages with more
than 250 characters.

You might also like