0% found this document useful (0 votes)
6 views14 pages

01 Data Representation

The document covers data representation in programming, focusing on user-defined data types, file organization, and floating-point number representation. It explains the necessity of user-defined types, composite and non-composite data types, various file organization methods, and the implications of floating-point approximations and rounding errors. Additionally, it discusses normalization, precision versus range, and potential issues like overflow and underflow in floating-point arithmetic.

Uploaded by

nmirza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views14 pages

01 Data Representation

The document covers data representation in programming, focusing on user-defined data types, file organization, and floating-point number representation. It explains the necessity of user-defined types, composite and non-composite data types, various file organization methods, and the implications of floating-point approximations and rounding errors. Additionally, it discusses normalization, precision versus range, and potential issues like overflow and underflow in floating-point arithmetic.

Uploaded by

nmirza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

01 Data Representation

Candidates should be able to:


Show understanding of why user-defined types are necessary

Define and use non-composite types

Define and use composite data types

Choose and design an appropriate user-defined data type for a given problem

Show understanding of the methods of file organization and select an


appropriate method of file organization and file access for a given problem

Show understanding of methods of file access

Show understanding of hashing algorithms

Describe the format of binary floating-point real numbers

Convert binary floating-point real numbers into denary and vice versa

Normalize floating-point numbers

Show understanding of the consequences of a binary representation only


being an approximation to the real number it represents (in certain cases)

Show understanding that binary representations can give rise to rounding


errors

Definitions
The programming language defines the range of possible
Built-in data type values that can be assigned to and the operations that can be
applied to a variable

A data type for which the programmer has included the


User-defined data type
definition in the program

01 Data Representation 1
Composite data type A data type that is derived from other data types

Non-Composite data
A data type defined without referencing another data type
type

A data-type which provides an ordered list of values that a


Enumerated data type
variable of this type can take on

Pointer data type Used to reference a memory location

A data type that allows storing a finite number of different


Sets
values that have no order

Class Includes variables of given data types and methods

Serial file organization Records stored in the order they were added in

Sequential file Physically stores record and ordered according to their key
organization field value

Random file organization Stores records of data in a file in any available position

Each record in the file is read, one by one, until the desired
Sequential Access
record is found

Jumps to a specific record in the file without accessing other


Random Access
records

A mathematical formula used to perform a calculation on the


Hashing Algorithm
key field of the record

Normalized two’s
First and second bits of mantissa must be different for a
complement binary
floating point number to be normalized
number

Number following a calculation is too big to be represented in


Overflow
the given format

Underflow Number is too small to be represented in the given format

1.1 User-defined data types


1.1.1 Data Types
Built-in data types

01 Data Representation 2
The programming language defines the range of possible values that can be
assigned to and the operations that can be applied to a variable

User-defined data types


A data type for which the programmer has included the definition in the
program

It is used to create a new data type

Allows to extend the flexibility of the programming language

Non-Composite data types


A data type defined without referencing another data type

Example: Enumerated data type, Pointer

Enumerated data type

A data-type which provides an ordered list of values that a variable of this


type can take on

Example:
TYPE SchoolDay = (Monday, Tuesday, Wednesday, Thursday, Friday)

Note: Since this is a different data type than STRING, quotation marks are not
used to represent them

Pointer data type

Used to reference a memory location

Example:

DECLARE IntPointer : ^INTEGER


DECLARE MyVar : INTEGER

DECLARE MyVar2 : INTEGER

MyVar ← 57
IntPointer ← @MyVar

MyVar2 ← IntPointer^

IntPointer^ ← 100

01 Data Representation 3
@ symbol before a variable identifier gives the address of the variable

^ is put before the data type to define a pointer data type

^ is placed after a pointer variable. It is an identifier that dereferences the


pointer variable

Composite data types


A data type that is derived from other data types

They are used to extend functionality of programming language

Example: Records, Sets, Classes

Sets

A data type that allows storing a finite number of different values that have no
order

Example:
TYPE Sletter = SET OF CHAR
DEFINE vowel = (’a’, ‘e’, ‘i’, ‘o’, ‘u’) : Sletter

Classes

Includes variables of given data types and methods

Used in Object Oriented Programming, discussed in later chapters

01 Data Representation 4
Skill Check 1

1. Describe the purpose of a user-defined data type. [2]

2. Define, using pseudocode, the following enumerated data types:

a. SchoolDay to hold data about the days students are usually in school.
[1]

b. WeekEnd to hold data about the days that are not school days. [1]

3. Define, using pseudocode, the composite data type ClubMeet. This will
hold data about club members that includes:

First name and last name

The two days they attend:

One on a school day

One not on a school day

Use the enumerated types you created in part(b). [4]

Solution

1. To create a new data type to extend the flexibility of the programming


language

2. TYPE SchoolDay = (Monday, Tuesday, Wednesday, Thursday


TYPE WeekEnd = (Saturday, Sunday)

3. TYPE ClubMeet
DECLARE FirstName : STRING
DECLARE LastName : STRING
DECLARE Schoolday : SchoolDay
DECLARE Weekend : WeekEnd
ENDTYPE

01 Data Representation 5
1.2 File organization and access
1.2.1 File Organization
Serial File Organization
Records stored in the order they were added in

New records appended to end of file

Accessed only sequentially

Reorganization not needed when a new record is added

Sequential File Organization


Physically stores record and ordered according to their key field value

Accessed both sequentially and by direct access with an index file

High hit rate (everyone needs a statement)

Justification:

Suitable for batch processing

All customers need statement

There’s a unique key field

Organized by unique key field

Random File Organization


Stores records of data in a file in any available position

The location of any record in the file is found by using a hashing algorithm on
the key field of a record

Low waiting time

Low hit rate (only one record will match your account number/username)

Justification:

Real time access

01 Data Representation 6
No need to search records.

1.2.2 File Access


Sequential Access
Each record in the file is read, one by one, until the desired record is found

Efficient when every record in the file needs to be processed

Every record is searched until a record is found, or whole file has been
searched and not found, or if key field of current record being checked is
greater than the key field of record being searched in sequential file
organization

Direct Access
Jumps to a specific record in the file without accessing other records

Required when only an individual record from a file needs to be processed

For sequential file, an index of all the key fields is kept and used to look up the
address of the file location where a given record is stored

For random access files, a hashing algorithm is used on the key field to
calculate the address of the file location where a given record is stored

1.2.3 Hashing Algorithms


A mathematical formula used to perform a calculation on the key field of the
record

The result of the calculation gives the address where the record should be
found

To write a record:

Key field is hashed to produce a location address

If location is free, add the data there

Otherwise, use an overflow method to find a free location

If no free location, data cannot be stored

01 Data Representation 7
To read a record:

Key field is hashed to produce a location address

The record at the address is checked to see if it matches the desired


record

If it does not match, then the following records need to be read until a
match is found (if open hash is used), or the overflow area needs to be
searched for a match (if closed hash is used)

Skill Check 2

1. Compare sequential and serial methods of file organization. [4]

2. State the most suitable method of file access when a record is


referenced by a unique address on a disk-type storage medium. [1]

3. State the most suitable method of file access when a bank stores its
data records in ascending order of account number. [1]

Solution

1. In both serial and sequential files, records are stored one after the
other and need to be accessed one after the other. Serial files are
stored in chronological order, whereas sequential files are stored
with ordered records and stored in the order of the key field. In serial
files, new records are added in the next available space. In
sequential files, new records are inserted in the correct position.

2. Direct Access

3. Sequential Access

1.3 Floating point numbers, representation


and manipulation
Allows for representation of fractional values in binary number system

The number is the form M × 2E 

01 Data Representation 8
M is the mantissa and E is the exponent

1.3.1 Converting binary floating-point numbers into


denary
For the mantissa, start with -1 as the first bit, 0.5 as the second bit, and every
next bit being half of the previous one

Add up all the mantissa values where a 1 bit appears to get the value of M

For the exponent consider the normal 8-bit binary number with the right most
bit representing 1

Add up the exponent values where a 1 bit appears to get the value of E

Use M×2E to get the denary output

For example:

0.1011010 00000100
1 1 1 1 45
0.1011010= 2

+ 8

+ 16

+ 64

= 64 

Exponent = 4

Hence, the number is 45


64
​ × 24 = 11.25

1.3.2 Converting denary numbers into binary floating-


point numbers
Take a denary number and convert to normal two’s complement binary with
the decimal point and replace missing bits to the right with 0s

Move the binary point after the first digit

Represent the exponent by the number of places the decimal place is shifted
left to

01 Data Representation 9
For example:
4.5 →
0100.1000→ 0.1001000= Mantissa
00000011= Exponent

1.3.3 Potential rounding errors and approximations


Most numbers cannot be represented exactly in binary representation, hence
an approximate value is represented

This approximation becomes more accurate with greater number of bits


allowed to represent the number

Repeated calculation and using of previously rounded values can lead to an


inevitable rounding error

The inaccuracy may become significant enough to see

Normalization
First and second bits of mantissa must be different for a floating point number
to be normalized

Reasons:

To store maximum range of numbers with the smallest number of bits

To maximize the precision of the number for the given number of bits

To prevent multiple representations of the same number

To minimize the number of leading zeroes/ones

To normalize, simply make the mantissa start with 1.0 (for a negative number)
or 0.1 (for a negative number) by shifting bits left

For example:

0.0011100 00000101
Shift bits left to get 0.1110000

Since it was shifted left, the value was increased for the relevant bit,
hence reduce the exponent by the number of left shifts, which is 2 in

01 Data Representation 10
this case

Reducing exponent by 2 gives 00000011

Hence, the normalized form is: 0.111000000000011

1.3.4 Precision versus range


There is a trade-off between range and precision

More bits in mantissa means greater precision and vice versa

More bits in exponent means a greater range and vice versa

For a fixed number of bits, increasing one results in the decrease of other
and hence the trade-off

For a binary number with an 8-bit mantissa and an 8-bit exponent (using two’s
complement):

The maximum positive number which can be stored is:

127
0111111101111111= 128
​ × 2127 
The smallest positive number which can be stored is:

1
0100000010000000= 2
​ × 2−128 
The smallest magnitude negative number which can be stored is:

65
1011111110000000= 128

× 2−128 
The largest magnitude negative number which can be stored is:

1000000001111111= −1 × 2127 

1.3.5 Floating-point problems


Overflow: Number following a calculation is too big to be represented in the
given format

01 Data Representation 11
Underflow: Number is too small to be represented in the given format

Unable to stored the number zero in normalized form

01 Data Representation 12
Skill Check 3

1. Numbers are stored in a computer using floating-point representation


with:

12 bits for the mantissa

4 bits for the exponent

two’s complement form for both the mantissa and exponent

a) Write the normalized floating-point representation of the following


unsigned binary number using this system. [2]

1011100.011001

b) State the consequence of storing the binary number in part(a)(i) as a


floating-point number in this system. Justify your answer. [2]

2. Explain the reason why binary numbers are stored in normalized form.
[3]

Solution

1.

a. Mantissa: 010111000110
Exponent:
0111
b. The accuracy of the number would be reduced because the least
significant bits of the original number have been lost.

2. To store the maximum range of numbers in the minimum number of


bits. Normalization minimizes the number of significant bits enabling
very large numbers to be stored with accuracy. Avoids the possibility
of many numbers having multiple representations.

Points To Note

01 Data Representation 13
Examples of non-composite user-defined data types include enumerated and
pointer data types

Record, set and class are examples of composite user-defined data types

File organization allow for serial, sequential or direct access

Floating-point representation for a real number allows a wider range of values


to be represented

A normalized floating-point representation achieves the best precision for the


value stored

Stored floating-point values rarely give an accurate representation of the


denary equivalent

01 Data Representation 14

You might also like