SS3 Note 2nd Term
SS3 Note 2nd Term
Data representation refers to the methods used internally to represent information stored in the
computer system.
Data and instructions cannot be entered and processed directly into computers using human language.
Any type of data, be it numbers, letters, special symbols, sound or pictures must first be converted into
machine-readable form i.e. binary form. Due to this reason, it is important to understand how a
computer together with its peripheral devices handles data in its electronic circuits, on magnetic media
and in optical devices.
Numbers
Text
Sound
The terms bits, bytes, nibble and word are used widely about computer memory and data size.
Bits: means binary digit, which can be 0 or 1. it is the basic unit of data or information in digital
computers.
Byte: a group of bits (8 bits) used to represent a character. A byte is considered as the basic unit of
measuring memory size in4 computer.
Word: two or more bits make a word. The term word length is used as the measure of the number of
bits in each word. For example, a word can have a length of 16 bits, 32 bits, 64 bits etc.
Electronic components, such as a microprocessor, are made up of millions of electronic circuits. The
availability of high voltage(on) in these circuits is interpreted as ‘1’ while a low voltage (off) is
interpreted as ‘0’.This concept can be compared to switching on and off an electric circuit. When the
switch is closed the high voltage in the circuit causes the bulb to light (‘1’ state).on the other hand when
the switch is open, the bulb goes off (‘0’ state). This forms a basis for describing data representation in
digital computers using the binary number system.
The laser beam reflected from the land is interpreted, as 1. The laser entering the pot is not reflected.
This is interpreted as 0. The reflected pattern of light from the rotating disk falls on a receiving
photoelectric detector that transforms the patterns into digital form. The presence of a magnetic field in
one direction on magnetic media is interpreted as 1; while the field in the opposite direction is
interpreted as “0”.Magnetic technology is mostly used on storage devices that are coated with special
magnetic materials such as iron oxide. Data is written on the media by arranging the magnetic dipoles of
some iron oxide particles to face in the same direction and some others in the opposite direction
In optical devices, the presence of light is interpreted as ‘1’ while its absence is interpreted as ‘0’.Optical
devices use this technology to read or store data. Take the example of a CD-ROM, if the shiny surface is
placed under a powerful microscope, the surface is observed to have very tiny holes called pits. The
areas that do not have pits are called land.
CHARACTER SET
Character set is a defined list of characters recognized by the computer hardware and software
BCD
BDC means Binary Coded Decimal. It is a digital encoding for decimal numbers in which each number is
represented by its own binary sequence. BDC is usually represented in four bits. It represents 0 to 9.
EBCDIC
Extended Binary Coded Decimal Interchange Code is a character encoding set used by IBM mainframe.
It has 256 8-bit characters for representing both numbers and texts.
ASCII
American Standard Code Information Interchange consists of (0 to 127) 128 symbols which are
characters on a standard keyboard plus a few extra.
ASCII Table
Dec = Decimal Value
Char = Character
ASCII TABLE
Dec Char Dec Char Dec Char Dec Char
8 BS (backspace) 40 ( 72 H 104 h
Unicode is a universal standard for character encoding. The introduction of ASCII characters was
not enough to cover all the languages. Therefore, to overcome this situation, it was introduced.
The Unicode Consortium introduced this encoding scheme.
This standard includes roughly 149,186 characters to represent characters of different languages.
While ASCII uses only 1 byte the Unicode uses 4 bytes to represent characters. Hence, it provides a
very wide variety of encoding. It has three types namely UTF-8, UTF-16, UTF-32. Among them,
UTF-8 is used mostly it is also the default encoding for many programming languages.
UCS
It is a very common acronym in the Unicode scheme. It stands for Universal Character
Set. Furthermore, it is the encoding scheme for storing the Unicode text.
UCS-2: It uses two bytes to store the characters.
UCS-4: It uses two bytes to store the characters.
UTF
The UTF is the most important part of this encoding scheme. It stands for Unicode Transformation
Format. Moreover, this defines how the code represents Unicode. It has 3 types as follows:
UTF-7
This scheme is designed to represent the ASCII standard. Since the ASCII uses 7 bits encoding. It
represents the ASCII characters in emails and messages which use this standard.
UTF-8
It is the most commonly used form of encoding. Furthermore, it has the capacity to use up to 4
bytes for representing the characters. It uses:
1 byte to represent English letters and symbols.
2 bytes to represent additional Latin and Middle Eastern letters and symbols.
3 bytes to represent Asian letters and symbols.
4 bytes for other additional characters.
Moreover, it is compatible with the ASCII standard.
Its uses are as follows:
Many protocols use this scheme.
It is the default standard for XML files
Some file systems Unix and Linux use it in some files.
Internal processing of some applications.
It is widely used in web development today.
It can also represent emojis which is today a very important feature of most apps.
UTF-16
It is an extension of UCS-2 encoding. Moreover, it uses to represent the 65536 characters.
Moreover, it also supports 4 bytes for additional characters. Furthermore, it is used for internal
processing like in java, Microsoft windows, etc.
UTF-32
It is a multibyte encoding scheme. Besides, it uses 4 bytes to represent the characters.
Importance of Unicode
As it is a universal standard therefore, it allows writing a single application for various
platforms. This means that we can develop an application once and run it on various
platforms in different languages. Hence we don’t have to write the code for the same
application again and again. And therefore the development cost reduces.
Moreover, data corruption is not possible in it.
It is a common encoding standard for many different languages and characters.
We can use it to convert from one coding scheme to another. Since Unicode is the superset
for all encoding schemes. Hence, we can convert a code into Unicode and then convert it
into another coding standard.
It is preferred by many coding languages. For example, XML tools and applications use this
standard only.
Advantages of Unicode
It is a global standard for encoding.
It has support for the mixed-script computer environment.
The encoding has space efficiency and hence, saves memory.
A common scheme for web development.
Increases the data interoperability of code on cross platforms.
Saves time and development cost of applications.
People use this scheme all over It has only limited characters hence,
the world. it cannot be used all over the world.
The Unicode characters
themselves involve all the
It has its equivalent coding
characters of the ASCII encoding.
characters in the Unicode.
Therefore we can say that it is a
superset for it.
Data ethics refer and adhere to the principles and values on which human rights and personal data
protection laws are based
Keeping data safe is very important for many reasons. There can be very confidential details that people
want to keep safe. Data can be corrupted or deleted either through accidental or malicious act. There
are many sources of security breaches
Malware:
This is software used to gain access or damage a computer without the knowledge of the owner. There
are various types of malware including spyware, key loggers, true viruses, worms, or any type of
malicious code that ;
Infiltrates a computer.
Disrupts operations.
Steals sensitive information.
Allows unauthorized access to system resources.
Slows computer or web browser speeds.
Creates problems connecting to networks.
Results in frequent freezing or crashing.
Hacking:
Hacking is the act of gaining illegal access to a computer system. Hacking can lead to identity theft and
gain of confidential data. Data can be deleted, changed and even corrupted.
Viruses:
Viruses are programs or a program code which can replicate itself with the intention of deleting or
corrupting files, or cause the computer to malfunctions. It can delete files and data and it can corrupt
them. It can also cause the device to crash and not respond.
Phishing:
Phishing is run by a person or a creator that sends out a legitimate looking email. and as soon as the
recipient clicks on the link, they are sent to a fake website. The creator of the email can access personal
data and this can lead to fraud or identity theft.
Pharming:
Pharming is a code installed on a user's hard drive or on the web server which redirects the user to a
fake website without the user knowing. The creator can get access to personal data and leads to fraud
or identity theft.
Wardriving:
The act of locating and using wireless internet connections illegally; it only requires a laptop (or other
portable device), a wireless network card and a antenna to pick up wireless signals. This can potentially
lead to the users internet time to be stolen, and it is very easy to steals a users password and personal
details.
Spyware/Key-Logging software:
Software that gathers information by monitoring key presses on the user’s keyboard; the information is
then sent back to the person who sent the software. This gives access to all the data entered using a
keyboard on the user’s computer. The software is able to install other spyware; read cookie data and
also change user’s default web browser.
Cookies:
is a packet of information sent by a web server to a web browser. Cookies are generated each time the
user visits the website.
A National ICT Policy is a policy put into place by governments' and stakeholders' who are committed to
the process of bringing digital technology to all individuals and communities so that they can have
access to information. If it is poorly implemented, it can lead to security breach.