Notes - Data Representation
Notes - Data Representation
Z in Merch nt
a
a
f
a
a
Data Types
Built-in D t Types
Built-in Data types are those data types that are pre-de ned by the programming
Zain Merchant
language. Most languages have native data types that are created for ease of execution of
data. Such Data types are called Built-in Data Types. These data types can be used
directly in the program without the hassle of creating them. Every programming language
has its own speci c set of built-in data types.
Built-in Data types are those data types that can be directly used by the programmer to
declare and store di erent variables in a program.
char Single
CHAR —
alphanumerical
character
date Value to represent a
DATE class datetime
date
Whole number,
integer INTEGER int
positive or negative
2
fl
fi
ff
a
a
fi
User-de ined d t types
Zain Merchant
Wh t is User-de ined d t type
• A data type constructed by a programmer // not a primitive data type
• A data type that references at least one other data type (the data types can be
primitive, or user de ned)
• The programmer needs specify a new data type that meets the requirements of the
application / program
3
a
fl
fi
f
f
f
a
a
a
a
a
a
Non-composite d t types
A user-de ned data type is a data type which the programmer has designed for use
Zain Merchant
e.g. The following pseudocode declares an enumerated data type for months,
TYPE TMonth =
(Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec)
// The TMonth data type is declared
ThisMonth ← May
// The variable ThisMonth
Note: Values in an enumerated data type are not string values; do not enclose them with
quotation marks.
4
fi
a
a
a
a
a
fi
fi
Pointer D t type
Zain Merchant
A pointer data type references memory locations. Thus, it has to relate to the type of
data that it is pointing to.
e.g. This pseudocode declares a pointer data type, a pointer, and uses the pointer.
IntegerPointer ← @MyNum
// This will assign value which is address of MyNum to the
pointer variable IntegerPointer
YourNum ← IntegerPointer^
// This accesses the data stored at the address which
IntegerPointer points to. This is known as dereferencing.
Pointers are typically used to construct complex data structures, such as linked lists and
binary trees. These data structures are discussed later on.
5
a
a
Zain Merchant
Wh t is pointer d t type
• A user-de ned non-composite data type used to reference a memory location.
Char String
6
a
a
fi
fi
a
a
a
a
a
a
a
Composite d t type
A data type is known as a composite data type when it represents a number of similar or
Zain Merchant
di erent data under a single declaration of variable i.e., a data type that has multiple
values grouped together.
Record D t type
A record data type stores a collection of information regarding a common subject,
similar to a record in a database. It is constructed from several elds, which each have
their own data types; thus, the record data type is a composite data type.
e.g. The following pseudocode de nes a record type for a student record:
TYPE TStudentRecord
// The record consists of several fields
DECLARE FirstName : STRING
// Each field has its own data type
DECLARE LastName : STRING
DECLARE Absences : INTEGER
DECLARE Class : STRING
ENDTYPE
Student1.FirstName ← "Zain"
// Fields can be accessed using dot notation
Student1.LastName ← “Merchant"
Student1.Absences ← 0
Student1.Class ← "A2 Level"
7
ff
a
a
fi
a
a
fi
Set D t type
Zain Merchant
A set data type is a composite data type that allows a programmer to apply set theory
operations to data in a program.
• Union
• Di erence
• Intersection
• Including an element
• Excluding an element
8
ff
a
a
Object / Cl ss D t type
Zain Merchant
An object data type is a composite data type used in object-oriented programming to
de ne classes.
Essentially, objects are just records with functions that act on the data that they contain.
Array Dictionary
9
fi
a
a
a
a
a
Data Representation
File org niz tion nd ccess
Z in Merch nt
a
a
a
a
a
a
File organization
A le is a collection of records. Each record is a collection of elds. Every eld consists of
Zain Merchant
a value.
Seri l Files
A serial le is a collection of records with no de ned order. Records enter the le in
chronological order. All records have a de ned format so that they can be input and
output correctly.
A text le can be considered an example of a serial le: a series of characters are input, in
chronological order, to produce a le.
A common use of serial les is for real-time processing. Records can be entered in real
time, as quickly as possible, because they do not need to be sorted. This makes serial les
e cient.
• It is cheap
• It cannot support modern high speed requirements for quick record access.
12
ffi
fi
a
fi
a
fi
a
a
a
a
fi
a
fi
a
fi
a
fi
a
a
fi
a
fi
fi
fi
fi
Sequenti l Files
A sequential le stores records in order of a key eld. In order for it to be possible to sort
Zain Merchant
records by key eld, this eld needs to be unique and sequential but does not need to be
consecutive.
In a sequential le, a particular record can be found by reading all of the key elds until
you reach the one you are looking for.
Because the record in a le are sorted in a particular order, better le searching methods
like the binary search technique can be used to reduce the time used for searching a le .
• The binary search technique can be used to reduce record search time by as much
as half the time taken.
• Sequential records cannot support modern technologies that require fast access to
stored records.
• The requirement that all records be of the same size is sometimes di cult to enforce.
13
a
a
a
a
fi
fi
fi
a
fi
a
fi
a
a
fi
a
a
a
a
fi
ffi
fi
fi
Zain Merchant
• Sequential les are stored with ordered records and stored in the order of the key eld
• In serial les, new records are added in the next available space / records are
appended to the le
14
fi
fi
a
fi
fi
fi
fi
a
a
a
f
fi
R ndom Files
The random le organisation method physically stores records of data in a le in any
Zain Merchant
available position. The location of any record in the le is found by using a hashing
algorithm on the key eld of a record.
H shing
Hashing is the process of transforming the key value of a record to yield an address
location where the record is stored. A hash function generates the record address by
performing some simple operations on the key or parts of the key.
15
a
a
fi
fi
a
a
fi
a
a
fl
fi
f
fi
Zain Merchant
1. An open hash where the record is stored in the next free space.
2. A closed hash where an over ow area is set up and the record is stored in
the next free space in the over ow area.
When reading a record from a le using direct access, the address of the location to read
from is calculated using the hashing algorithm and the key eld of the record stored there
is read. But, before using that record, the key eld must be checked against the original
key eld to ensure that they match. If the key elds do not match, then the following
records need to be read until a match is found (open hash) or the over ow area needs to
be searched for a match (closed hash).
• Record IDs equal, record is found • Record IDs not equal, search over ow area /
next record
16
fi
a
a
a
a
fl
fl
fi
a
a
fi
fi
a
f
fi
f
fl
fl
File Access
There are two ways to access a speci c record within a le: sequential access and direct
Zain Merchant
access. Serial and sequential les can be accessed using sequential access and random
les can be accessed using direct access.
Sequential access is where each record in the le is read, one by one, until the desired
record is found.
Direct access is where a hashing algorithm is used to jump to a speci c record in the le.
Direct access can be achieved with a sequential le. A separate index le is created which
has two elds per record. The rst eld has the key eld value and the second eld has a
value for the position of this key eld value in the main le.
However, in a direct-access le, data can be deleted or edited in place: there is no need
for a new le.
A sequential le is suitable for applications when multiple records are required from one
search of the le. An example could be a family history le where a search could be used
for all records with a particular family name.
17
fi
fi
fi
fi
fi
fi
fi
fi
f
fi
fi
fi
a
fi
fi
fi
a
fi
fi
a
a
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
Data Representation
Flo ting-point numbers,
represent tion nd
m nipul tion
Z in Merch nt
a
a
a
a
a
a
a
Floating point numbers
Represent tion
Floating-point notation is a way of representing very small or very large numbers with
Zain Merchant
A oating-point number consists of three parts: the sign bit, the mantissa, and the
exponent. To nd the value of the number, we use where ± is determined by the
most signi cant bit bit, M is the mantissa, and E is the exponent.
As the mantissa or exponent could be negative, two’s complement is usually used for
both, but this should always be stated in the question. This means that the leftmost digit
is 1 if negative and 0 if positive. The inclusion of binary places does not change this.
Worked ex mple
Calculate the denary value of the oating point binary number 0010101010110, where 9 bits
are used for the mantissa followed by 4 bits for the exponent, both in two’s complement.
Step 1: split the binary number into the mantissa (0.01010101) and the exponent (0110).
Step 3: move the binary point in the mantissa six places to the right. 0010101.01
Step 4: the mantissa is now 0010101.01 — converting to denary gives (16 + 4 + 1 + 0.25) =
21.25
20
fl
fi
fi
a
a
fl
fi
Norm lis tion
Normalisation is the process of maximising the precision of a number when storing it in a
Zain Merchant
given number of bits. To do this for a positive binary number involves removing any
leading zeros. To do the same for a negative binary number involves removing any
leading ones. This means that a normalised oating point number must always start as
either 0.1 or 1.0.
To normalise a oating point binary number, you simply move the binary point in the
mantissa right or left until the above is true. The exponent is then decreased (if the
binary point is moved right) or increased (if moved left) by the number of places moved.
Worked ex mple
The oating point binary number 000101010101 has an 8-bit mantissa followed by a 4-bit
exponent, both in two’s complement. Find the normalised version of this number.
Step 1: split the binary number into the mantissa (0.0010101) and the exponent (0101).
Step 2: move the binary point in the mantissa right until it starts as 0.1 or 1.0. Moving it two
places ensures that the mantissa is now 0.10101.
Step 3: however, the mantissa is now only 6 digits in length, so add two zeros to the right-
hand side. This keeps it at 8 digits (as required by the question) but does not change the
value. The mantissa is now 0.1010100.
Step 4: as the binary point has been moved two places right, the exponent must be
decreased by 2. The exponent is currently 0101 (5 in denary), so subtracting 2 from this gives
0011 (3).
Step 5: join together the new mantissa and exponent in the format given in the question. In
this case, the normalised value is 010101000011.
21
fl
a
fl
a
a
fl
Worked ex mple
Zain Merchant
Give the denary value 27.75 as a normalised oating point binary number, using 8 bits for
the mantissa and 4 bits for the exponent, both in two’s complement.
Step 1: write 27.75 as a xed point binary number using two’s complement = 011011.11.
Step 2: move the binary point ve places to the left so that the mantissa is normalised
(giving 0.1101111).
Step 3: the binary point was moved ve places left, so increase the exponent by 5. (The
exponent is currently 0, so 0 + 5 = 5.) This is 0101 in two’s complement binary.
Step 4: join together the new mantissa and exponent in the format given in the question.
In this case, the normalised value is 011011110101.
Worked ex mple
Consider the conversion of 8.63. The rst step is the same but now the .63 has to be
converted by the ‘multiply by two and record whole number parts’ method.
.63 × 2 = 1.26 so 1 is stored to give the fraction .1 .26 × 2 = .52 so 0 is stored to give the
fraction .10 .52 × 2 = 1.04 so 1 is stored to give the fraction .101 .04 × 2 = .08 so 0 is stored to
give the fraction .1010
At this stage it can be seen that, multiplying .08 by 2 successively is going to give a lot of
zeros in the binary fraction before another 1 is added so the process can be stopped. .63
has been approximated as .625. The nal representation becomes 0100010100 for the
mantissa and 0100 for the exponent.
22
a
a
fi
fi
fi
fi
fi
fl
R nge nd precision
When representing real numbers using oating point binary, the number of bits used for
Zain Merchant
the mantissa and exponent a ects both the range and precision of the number stored. If
the size of the exponent is increased, then larger numbers can be stored. However, this
will leave less room for the mantissa, so the number will have less precision. Conversely, a
larger mantissa will mean a more precise representation but at the expense of a smaller
exponent and so a smaller range.
Worked ex mple
Calculate the largest number that can be stored using an 8-bit binary number if the
number uses 3 bits for the mantissa and 5 bits for the exponent.
Step 1: with 3 bits for the mantissa and 5 bits for the exponent, then the largest number that can
be represented is 01101111.
Step 2: the mantissa is therefore 0.11, with the binary point needing to be moved 15 places to the
right (the exponent 01111 = 15).
Step 3: convert to denary. 0110000000000000 = 8192 + 16384 = 24576. This means that the
largest denary number that can be stored in this format is 24 576.
• Any increase in the number of bits for the mantissa, means fewer bits available for the
exponent // Any decrease in the number of bits for the mantissa, means more bits
available for the exponent
• More bits used for the mantissa will result in better precision
• More bits used for the exponent will result in a larger range of numbers
23
a
a
ff
a
a
a
ff
a
fl
Worked ex mple
Zain Merchant
Calculate the largest number that can be stored using an 8-bit binary number if the
number uses 5 bits for the mantissa and 3 bits for the exponent.
Step 1: with 5 bits for the mantissa and 3 bits for the exponent, the largest number that
can be represented is 01111011.
Step 2: the mantissa is therefore 0.1111, with the binary point needing to be moved three
places to the right (011 = 3). This means that the largest denary number that can be stored
in this format is 7.5.
The following values relate to an 8-bit mantissa and an 8-bit exponent (using two’s
complement):
24
a
Zain Merchant
Why bin ry numbers re stored in norm lised form
• To store the maximum range of numbers in the minimum number of bytes / bits
• Maximising the number of signi cant bits // maximising the (potential) precision /
accuracy of the number for the given number of bits (enables very large / small numbers
to be stored with accuracy.)
25
a
fi
a
a
Errors
Because of the trade-o between range and precision, the representation of some
Zain Merchant
oating point binary numbers may not always be as precise as we would like. For
example, if the number 76.65625 were to be stored, the full representation would be
0100110010101 for the mantissa (13 bits) and 0111 for the exponent (4 bits). If, however, only
10 bits were made available for the mantissa, then this would e ectively lose the three
leftmost bits, leaving us with a representation of 76.625. This is very close, but not as
precise as before.
This error can be described as an absolute or relative error. The absolute error is the
di erence in value between the original number and the representation. This is found by
subtracting one from the other. The relative error is the absolute error expressed as a
percentage of the true value.
Under low
Under ow is where the number is too small to be represented using the oating-point
system.
e.g. In a system with 8 bits for the mantissa and 4 bits for the exponent, the lowest
possible exponent is 1000, or -8 in denary. If the system is normalised, the smallest
positive mantissa value is 0 1000000. Thus, the smallest positive number in this system
is 0 1000000 1000, which is equal to 1/512. If a calculation in this system resulted in a
number which was lower than 1/512, there would be an under ow error, because the
number is too small to be stored.
26
fl
ff
fl
f
ff
fl
ff
fl
Zain Merchant
Over low
Over ow is similar to under ow, but it occurs when a number is too large to be stored in
the system.
e.g. In a system with an 8-bit mantissa and a 4-bit exponent, the largest possible number
that can be represented is 0 1111111 0111, which is equal to 127. If a calculation produced a
number higher than 127, there would be an over ow error and the number could not be
stored.
Over ow and under ow can both occur with negative values that are too large or too
small.
Rounding error
A rounding error is where a number cannot be represented exactly, and needs to be
approximated.
e.g. The number 1/3 can only be represented in binary using recurring bits (0.0101). The
oating-point format does not allow for recurring bits as there is only a nite amount of
memory in the system. Thus, it needs to be rounded, so it will be represented as 0
1010101 1111.
27
fl
fl
fl
f
fl
fl
fl
fi