0% found this document useful (0 votes)
4 views

Notes - Data Representation

The document discusses data types in programming, distinguishing between built-in and user-defined data types, including enumerated and pointer types. It also covers file organization methods such as serial, sequential, and random access files, detailing their advantages and disadvantages. Additionally, it explains how to access, delete, and edit data within these file types, emphasizing the importance of choosing the appropriate file organization based on application needs.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Notes - Data Representation

The document discusses data types in programming, distinguishing between built-in and user-defined data types, including enumerated and pointer types. It also covers file organization methods such as serial, sequential, and random access files, detailing their advantages and disadvantages. Additionally, it explains how to access, delete, and edit data within these file types, emphasizing the importance of choosing the appropriate file organization based on application needs.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Data Representation

User-de ined d t types

Z in Merch nt
a
a
f
a
a
Data Types
Built-in D t Types
Built-in Data types are those data types that are pre-de ned by the programming
Zain Merchant

language. Most languages have native data types that are created for ease of execution of
data. Such Data types are called Built-in Data Types. These data types can be used
directly in the program without the hassle of creating them. Every programming language
has its own speci c set of built-in data types.

Built-in Data types are those data types that can be directly used by the programmer to
declare and store di erent variables in a program.

Data Type Description Pseudocode Python

boolean Logical values, True


BOOLEAN bool
(1) and False (2)

char Single
CHAR —
alphanumerical
character
date Value to represent a
DATE class datetime
date

Whole number,
integer INTEGER int
positive or negative

real Positive or negative


REAL oat
number with a
decimal point
Sequence of
string STRING str
alphanumerical
characters

2
fl
fi
ff
a
a
fi
User-de ined d t types

Zain Merchant
Wh t is User-de ined d t type
• A data type constructed by a programmer // not a primitive data type

• A data type that references at least one other data type (the data types can be
primitive, or user de ned)

Purpose of User-de ined d t types


• To create a new data type (from existing data types)

• To allow data types not available in a programming language to be constructed // To


extend the exibility of the programming language

• The programmer needs specify a new data type that meets the requirements of the
application / program

3
a
fl
fi
f
f
f
a
a
a
a
a
a
Non-composite d t types
A user-de ned data type is a data type which the programmer has designed for use
Zain Merchant

within a program, as opposed to a built-in data type. A non-composite type is de ned


without reference to another data type, whereas a composite data type is built from
other data types.

Enumer ted D t type


An enumerated data type is a non-composite data type de ned from an ordered list of
values. Variables can be declared by this data type, and assigned one of the values in the
list

e.g. The following pseudocode declares an enumerated data type for months,

TYPE TMonth =
(Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec)
// The TMonth data type is declared

DECLARE ThisMonth : TMonth


// A variable is declared with the TMonth enumerated data
type.

ThisMonth ← May
// The variable ThisMonth

ThisMonth > Feb


// This is True, as May is later in the list than Feb

Note: Values in an enumerated data type are not string values; do not enclose them with
quotation marks.

4
fi
a
a
a
a
a
fi
fi
Pointer D t type

Zain Merchant
A pointer data type references memory locations. Thus, it has to relate to the type of
data that it is pointing to.

e.g. This pseudocode declares a pointer data type, a pointer, and uses the pointer.

TYPE TMyPointer = ^INTEGER


// This declares TMyPointer as a pointer data type which
points to Integers.

DECLARE IntegerPointer : TMyPointer


// This declares a variable with the TMyPointer data.

DECLARE MyNum, YourNum : INTEGER


MyNum ← 23
// This will create a 2 new Integer variables and assign a
value for one of them

IntegerPointer ← @MyNum
// This will assign value which is address of MyNum to the
pointer variable IntegerPointer

YourNum ← IntegerPointer^
// This accesses the data stored at the address which
IntegerPointer points to. This is known as dereferencing.

Pointers are typically used to construct complex data structures, such as linked lists and
binary trees. These data structures are discussed later on.

5
a
a
Zain Merchant

Wh t is enumer ted d t type


• A (user-de ned non-composite) data type with an ordered list of possible values.

Wh t is pointer d t type
• A user-de ned non-composite data type used to reference a memory location.

Other Non-Composite D t types

Integer Boolean Real

Char String

6
a
a
fi
fi
a
a
a
a
a
a
a
Composite d t type
A data type is known as a composite data type when it represents a number of similar or

Zain Merchant
di erent data under a single declaration of variable i.e., a data type that has multiple
values grouped together.

Record D t type
A record data type stores a collection of information regarding a common subject,
similar to a record in a database. It is constructed from several elds, which each have
their own data types; thus, the record data type is a composite data type.

e.g. The following pseudocode de nes a record type for a student record:

TYPE TStudentRecord
// The record consists of several fields
DECLARE FirstName : STRING
// Each field has its own data type
DECLARE LastName : STRING
DECLARE Absences : INTEGER
DECLARE Class : STRING
ENDTYPE

DECLARE Student1 : TStudentRecord


// The variable is declared as a record

Student1.FirstName ← "Zain"
// Fields can be accessed using dot notation
Student1.LastName ← “Merchant"
Student1.Absences ← 0
Student1.Class ← "A2 Level"

7
ff
a
a
fi
a
a
fi
Set D t type
Zain Merchant

A set data type is a composite data type that allows a programmer to apply set theory
operations to data in a program.

These operations typically include:

• Union

• Di erence

• Intersection

• Including an element

• Excluding an element

• Checking whether an element is in a set

TYPE LetterSet = SET OF CHAR


// This declares LetterSet as a set data type which of
type CHAR.

DEFINE Vowels (‘A’, ‘E’, ‘I’, ‘O’, ‘U’) : LetterSet


// The variable is declared as LetterSet with values

8
ff
a
a
Object / Cl ss D t type

Zain Merchant
An object data type is a composite data type used in object-oriented programming to
de ne classes.

Essentially, objects are just records with functions that act on the data that they contain.

Other Composite D t types

Array Dictionary

List Linked List

9
fi
a
a
a
a
a
Data Representation
File org niz tion nd ccess

Z in Merch nt
a
a
a
a
a
a
File organization
A le is a collection of records. Each record is a collection of elds. Every eld consists of
Zain Merchant

a value.

Seri l Files
A serial le is a collection of records with no de ned order. Records enter the le in
chronological order. All records have a de ned format so that they can be input and
output correctly.

A text le can be considered an example of a serial le: a series of characters are input, in
chronological order, to produce a le.

A common use of serial les is for real-time processing. Records can be entered in real
time, as quickly as possible, because they do not need to be sorted. This makes serial les
e cient.

Adv nt ges of seri l org niz tion


• It is simple

• It is cheap

Dis dv nt ges of seri l org niz tion


• It is cumbersome to access because you have to access all proceeding records
before retrieving the one being searched.

• Wastage of space on medium in form of inter-record gap.

• It cannot support modern high speed requirements for quick record access.

12
ffi
fi
a
fi
a
fi
a
a
a
a
fi
a
fi
a
fi
a
fi
a
a
fi
a
fi
fi
fi
fi
Sequenti l Files
A sequential le stores records in order of a key eld. In order for it to be possible to sort

Zain Merchant
records by key eld, this eld needs to be unique and sequential but does not need to be
consecutive.

In a sequential le, a particular record can be found by reading all of the key elds until
you reach the one you are looking for.

Because the record in a le are sorted in a particular order, better le searching methods
like the binary search technique can be used to reduce the time used for searching a le .

Adv nt ges of sequenti l org niz tion


• The sorting makes it easy to access records.

• The binary search technique can be used to reduce record search time by as much
as half the time taken.

Dis dv nt ges of sequenti l org niz tion


• The sorting does not remove the need to access other records as the search looks for
particular records.

• Sequential records cannot support modern technologies that require fast access to
stored records.

• The requirement that all records be of the same size is sometimes di cult to enforce.

13
a
a
a
a
fi
fi
fi
a
fi
a
fi
a
a
fi
a
a
a
a
fi
ffi
fi
fi
Zain Merchant

Comp rison of seri l nd sequenti l iles


• In both serial and sequential les records are stored one after the other and need to
be accessed one after the other

• Serial les are stored in chronological order

• Sequential les are stored with ordered records and stored in the order of the key eld

• In serial les, new records are added in the next available space / records are
appended to the le

• In sequential les, new records are inserted in the correct position.

14
fi
fi
a
fi
fi
fi
fi
a
a
a
f
fi
R ndom Files
The random le organisation method physically stores records of data in a le in any

Zain Merchant
available position. The location of any record in the le is found by using a hashing
algorithm on the key eld of a record.

H shing
Hashing is the process of transforming the key value of a record to yield an address
location where the record is stored. A hash function generates the record address by
performing some simple operations on the key or parts of the key.

Storing d t in R ndom ccess ile


• Key eld is hashed to give address / home location

• Check if a record already stored at address / home location

• If nothing stored, store new record

• If another record already stored search over ow area / next record

• Until free space found or whole area searched

• If no space output error message

15
a
a
fi
fi
a
a
fi
a
a
fl
fi
f
fi
Zain Merchant

Collisions in R ndom ccess ile


There are two ways of dealing with this:

1. An open hash where the record is stored in the next free space.

2. A closed hash where an over ow area is set up and the record is stored in
the next free space in the over ow area.

When reading a record from a le using direct access, the address of the location to read
from is calculated using the hashing algorithm and the key eld of the record stored there
is read. But, before using that record, the key eld must be checked against the original
key eld to ensure that they match. If the key elds do not match, then the following
records need to be read until a match is found (open hash) or the over ow area needs to
be searched for a match (closed hash).

Se rching d t in R ndom ccess ile


• Computer ID hashed to give address / home location

• Compared to ID stored at address / home location

• Nothing stored, output message ‘record not found’

• Record IDs equal, record is found • Record IDs not equal, search over ow area /
next record

• Until record found or whole area searched

• If no record found error message

16
fi
a
a
a
a
fl
fl
fi
a
a
fi
fi
a
f
fi
f
fl
fl
File Access
There are two ways to access a speci c record within a le: sequential access and direct

Zain Merchant
access. Serial and sequential les can be accessed using sequential access and random
les can be accessed using direct access.

Sequential access is where each record in the le is read, one by one, until the desired
record is found.

Direct access is where a hashing algorithm is used to jump to a speci c record in the le.

Direct access can be achieved with a sequential le. A separate index le is created which
has two elds per record. The rst eld has the key eld value and the second eld has a
value for the position of this key eld value in the main le.

Deleting & Editing D t


In a sequentially accessed le, deleting and editing data requires the creation of a new
le. Data is moved from the old le to the new le until the part where the record needs
editing is reached.

However, in a direct-access le, data can be deleted or edited in place: there is no need
for a new le.

Choice of ile org nis tion


Serial le organisation is well suited to batch processing or for backing up data on
magnetic tape.

A direct access le is used if rapid access to an individual record in a large le is required.


An example would be on a system with many users. In this case, the le that is used to
check passwords when users log in should be direct-access.

A sequential le is suitable for applications when multiple records are required from one
search of the le. An example could be a family history le where a search could be used
for all records with a particular family name.

17
fi
fi
fi
fi
fi
fi
fi
fi
f
fi
fi
fi
a
fi
fi
fi
a
fi
fi
a
a
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
Data Representation
Flo ting-point numbers,
represent tion nd
m nipul tion

Z in Merch nt
a
a
a
a
a
a
a
Floating point numbers
Represent tion
Floating-point notation is a way of representing very small or very large numbers with
Zain Merchant

the same amount of bits. It is similar to scienti c notation.

A oating-point number consists of three parts: the sign bit, the mantissa, and the
exponent. To nd the value of the number, we use where ± is determined by the
most signi cant bit bit, M is the mantissa, and E is the exponent.

As the mantissa or exponent could be negative, two’s complement is usually used for
both, but this should always be stated in the question. This means that the leftmost digit
is 1 if negative and 0 if positive. The inclusion of binary places does not change this.

Worked ex mple
Calculate the denary value of the oating point binary number 0010101010110, where 9 bits
are used for the mantissa followed by 4 bits for the exponent, both in two’s complement.

Step 1: split the binary number into the mantissa (0.01010101) and the exponent (0110).

Step 2: calculate the denary value of the exponent. 0110 = 6

Step 3: move the binary point in the mantissa six places to the right. 0010101.01

Step 4: the mantissa is now 0010101.01 — converting to denary gives (16 + 4 + 1 + 0.25) =
21.25

20
fl
fi
fi
a
a
fl
fi
Norm lis tion
Normalisation is the process of maximising the precision of a number when storing it in a

Zain Merchant
given number of bits. To do this for a positive binary number involves removing any
leading zeros. To do the same for a negative binary number involves removing any
leading ones. This means that a normalised oating point number must always start as
either 0.1 or 1.0.

To normalise a oating point binary number, you simply move the binary point in the
mantissa right or left until the above is true. The exponent is then decreased (if the
binary point is moved right) or increased (if moved left) by the number of places moved.

Worked ex mple
The oating point binary number 000101010101 has an 8-bit mantissa followed by a 4-bit
exponent, both in two’s complement. Find the normalised version of this number.

Step 1: split the binary number into the mantissa (0.0010101) and the exponent (0101).

Step 2: move the binary point in the mantissa right until it starts as 0.1 or 1.0. Moving it two
places ensures that the mantissa is now 0.10101.

Step 3: however, the mantissa is now only 6 digits in length, so add two zeros to the right-
hand side. This keeps it at 8 digits (as required by the question) but does not change the
value. The mantissa is now 0.1010100.

Step 4: as the binary point has been moved two places right, the exponent must be
decreased by 2. The exponent is currently 0101 (5 in denary), so subtracting 2 from this gives
0011 (3).

Step 5: join together the new mantissa and exponent in the format given in the question. In
this case, the normalised value is 010101000011.

21
fl
a
fl
a
a
fl
Worked ex mple
Zain Merchant

Give the denary value 27.75 as a normalised oating point binary number, using 8 bits for
the mantissa and 4 bits for the exponent, both in two’s complement.

Step 1: write 27.75 as a xed point binary number using two’s complement = 011011.11.

Step 2: move the binary point ve places to the left so that the mantissa is normalised
(giving 0.1101111).

Step 3: the binary point was moved ve places left, so increase the exponent by 5. (The
exponent is currently 0, so 0 + 5 = 5.) This is 0101 in two’s complement binary.

Step 4: join together the new mantissa and exponent in the format given in the question.
In this case, the normalised value is 011011110101.

Worked ex mple
Consider the conversion of 8.63. The rst step is the same but now the .63 has to be
converted by the ‘multiply by two and record whole number parts’ method.

.63 × 2 = 1.26 so 1 is stored to give the fraction .1 .26 × 2 = .52 so 0 is stored to give the
fraction .10 .52 × 2 = 1.04 so 1 is stored to give the fraction .101 .04 × 2 = .08 so 0 is stored to
give the fraction .1010

At this stage it can be seen that, multiplying .08 by 2 successively is going to give a lot of
zeros in the binary fraction before another 1 is added so the process can be stopped. .63
has been approximated as .625. The nal representation becomes 0100010100 for the
mantissa and 0100 for the exponent.

22
a
a
fi
fi
fi
fi
fi
fl
R nge nd precision
When representing real numbers using oating point binary, the number of bits used for

Zain Merchant
the mantissa and exponent a ects both the range and precision of the number stored. If
the size of the exponent is increased, then larger numbers can be stored. However, this
will leave less room for the mantissa, so the number will have less precision. Conversely, a
larger mantissa will mean a more precise representation but at the expense of a smaller
exponent and so a smaller range.

Worked ex mple
Calculate the largest number that can be stored using an 8-bit binary number if the
number uses 3 bits for the mantissa and 5 bits for the exponent.

Step 1: with 3 bits for the mantissa and 5 bits for the exponent, then the largest number that can
be represented is 01101111.

Step 2: the mantissa is therefore 0.11, with the binary point needing to be moved 15 places to the
right (the exponent 01111 = 15).

Step 3: convert to denary. 0110000000000000 = 8192 + 16384 = 24576. This means that the
largest denary number that can be stored in this format is 24 576.

M ntiss -Exponent tr de-off


• The trade-o is between range and precision

• Any increase in the number of bits for the mantissa, means fewer bits available for the
exponent // Any decrease in the number of bits for the mantissa, means more bits
available for the exponent

• More bits used for the mantissa will result in better precision

• More bits used for the exponent will result in a larger range of numbers

23
a
a
ff
a
a
a
ff
a
fl
Worked ex mple
Zain Merchant

Calculate the largest number that can be stored using an 8-bit binary number if the
number uses 5 bits for the mantissa and 3 bits for the exponent.

Step 1: with 5 bits for the mantissa and 3 bits for the exponent, the largest number that
can be represented is 01111011.

Step 2: the mantissa is therefore 0.1111, with the binary point needing to be moved three
places to the right (011 = 3). This means that the largest denary number that can be stored
in this format is 7.5.

Step 3: convert to denary. 0111.1 = 7.5.

The following values relate to an 8-bit mantissa and an 8-bit exponent (using two’s
complement):

The maximum positive number which can be stored is:

The smallest positive number which can be stored is:

The smallest magnitude negative number which can be stored is:

The largest magnitude negative number which can be stored is:

24
a
Zain Merchant
Why bin ry numbers re stored in norm lised form
• To store the maximum range of numbers in the minimum number of bytes / bits

• Normalisation minimises the number of leading zeros/ones represented

• Maximising the number of signi cant bits // maximising the (potential) precision /
accuracy of the number for the given number of bits (enables very large / small numbers
to be stored with accuracy.)

• Avoids the possibility of many numbers having multiple representations.

25
a
fi
a
a
Errors
Because of the trade-o between range and precision, the representation of some
Zain Merchant

oating point binary numbers may not always be as precise as we would like. For
example, if the number 76.65625 were to be stored, the full representation would be
0100110010101 for the mantissa (13 bits) and 0111 for the exponent (4 bits). If, however, only
10 bits were made available for the mantissa, then this would e ectively lose the three
leftmost bits, leaving us with a representation of 76.625. This is very close, but not as
precise as before.

This error can be described as an absolute or relative error. The absolute error is the
di erence in value between the original number and the representation. This is found by
subtracting one from the other. The relative error is the absolute error expressed as a
percentage of the true value.

Under low
Under ow is where the number is too small to be represented using the oating-point
system.

e.g. In a system with 8 bits for the mantissa and 4 bits for the exponent, the lowest
possible exponent is 1000, or -8 in denary. If the system is normalised, the smallest
positive mantissa value is 0 1000000. Thus, the smallest positive number in this system
is 0 1000000 1000, which is equal to 1/512. If a calculation in this system resulted in a
number which was lower than 1/512, there would be an under ow error, because the
number is too small to be stored.

26
fl
ff
fl
f
ff
fl
ff
fl
Zain Merchant
Over low
Over ow is similar to under ow, but it occurs when a number is too large to be stored in
the system.

e.g. In a system with an 8-bit mantissa and a 4-bit exponent, the largest possible number
that can be represented is 0 1111111 0111, which is equal to 127. If a calculation produced a
number higher than 127, there would be an over ow error and the number could not be
stored.

Over ow and under ow can both occur with negative values that are too large or too
small.

Rounding error
A rounding error is where a number cannot be represented exactly, and needs to be
approximated.

e.g. The number 1/3 can only be represented in binary using recurring bits (0.0101). The
oating-point format does not allow for recurring bits as there is only a nite amount of
memory in the system. Thus, it needs to be rounded, so it will be represented as 0
1010101 1111.

27
fl
fl
fl
f
fl
fl
fl
fi

You might also like