Data Representation
Data Representation
Data representation
When object-oriented programming is not being used, a programmer may choose not to use
any user-defined data types. However, for a large program, their use will make a program less
error-prone and more understandable. It also has less restriction and allows for inevitable user
definition. The use of built in data types are the same for any program. However, there can't be
a built-in record type because each different problem will need an individual definition of a
record.
Composite user-defined data types have a definition with a reference to at least one other type.
==Record Data type:== a data type that contains a fixed number of components that can be of
different types. it allows the programmer to collect together values with different data types
when these form a coherent whole. it could be used for the implementation of a data structure
TYPE
<main identifier>
ENDTYPE
MUHAMMAD WASEEM SABRI
==Set Data type:== allows a program to create sets and to apply the mathematical operations
• Union
• Difference
• Intersection
used-they're all user-defined data types. Then for each class, the objects must be defined.
Non-composite user-defined data types don’t involve a reference to another type. When a
programmer uses a simple built-in type the only requirement is for an identifier to be named
with a defined type. They have to be explicitly defined before an identifier can be created-
==Enumerated Data type:== a list of possible data values. The values defined here have an
implied order of values to allow comparisons to be made. Therefore value2 is greater than
MUHAMMAD WASEEM SABRI
value1(they're not string values and can't be quoted). This allows for comparisons to be made.
TYPE
<Datatype> = (<value1>,<value2>,<value3>…)
ENDTYPE
==Pointer Data type:== used to reframe a memory location. it may be used to construct
dynamically varying data structures. The pointer definition has to relate to the type of the
variable that is being pointed to(doesn’t hold a value but a reference/address to data).
TYPE
ENDTYPE
Special use of a pointer variable is to access the value stored at the address pointed to. The
Advertisement
Contents, in a file of any type, is stored using a defined binary code that allows the file to be
used in the way intended. But, for storing data to be used by a computer program, there are
A text file contains data stored according to a defined character code defined by ASCII or
A binary file is a file designed for storing data to be used by a computer program(0's and 1's). It
stores data in its internal representation(an integer value might be stored in 2 bytes in 2's
complement representation to represent a negative number) and this file is created using a
specific program. Its organization is based on records (a collection of fields containing data
==Serial files:== contains records that have no defined order. A text file may be a serial file
where the file has repeating lines which are defined by an end of line character(s). There's no
end of record character. A record in a serial file must have a defined format to allow data to be
input and output correctly. To access a specific record, it has to go through every record until
found.
File access: Successively read record by record until the data required is found thus very
slow. Uses:
Batch processing
Banks record transactions involving customer accounts every time there is a transaction
MUHAMMAD WASEEM SABRI
==Sequential files:== has records that are ordered and is suited for long term storage of data
and thus is considered an alternative to a database. A key field is required for a sequential file
to be ordered for which the values are unique and sequential. This way it can be easily
accessed. A sequential database file is more efficient than a text file due to data integrity,
privacy and less data redundancy. A change in one file would update any other files affected.
Primary keys from the DBMS(database management system) need to be unique but not
ordered unlike the key field from the sequential files which need to be ordered and unique. A
particular record is found by sequentially reading the value of the key field until the required
value is found.
File access:
Successively read the value In the key field until the required key is found.
To edit/delete data:
Create a new version of the file. Data is copied from the old file to the new file until the record
is reached which needs editing or deleting. For deleting, reading and copying of the old file
continue from the next record. If a record has been edited, the new version is written to the
new file and the remaining records are copied to the new file.
==Direct access/random access files:== access isn't defined by a sequential reading of the
file(random). It's well suited for larger files as it takes longer to access sequentially. Data in
direct access files are stored in an identifiable record which could be found by involving initial
direct access to a nearby record followed by a limited serial search. The choice of the position
chosen must be calculated using data in the record so the same calculation can be carried out
MUHAMMAD WASEEM SABRI
when subsequently there's a search for the data. One method is the hashing algorithm which
takes the key field as an input and outputs a value for the position of the record relative to the
start of the file. To access, the key is hashed to a specific location. This algorithm also takes into
account the potential maximum length of the file which is the number of records the file will
store.
eg: If the key field is numeric, divide by a suitable large number and use the remainder to find a
position. But we won't have unique positions. If a hash position is calculated that duplicates one
already calculated by a different key, the next position in the file is used. this is why a search will
involve direct access possibly followed by a limited serial search. That's why it's considered
File access:
The value in the key field is submitted to the hashing algorithm which then provides the same
value for the position in the file that was provided when the algorithm was used at the time of
data input. It goes to that hashed position and through another short linear search because of
To edit/delete data:
Only create a new file if the current file is full. A deleted record can have a flag set so that in a
subsequent reading process the record is skipped over. This allows it to be overwritten.
Uses:
Most suited for when a program needs a file in which individual data items might be read,
updated or deleted.
MUHAMMAD WASEEM SABRI
How often do transactions take place, how often does one need to add data?
representation
digits.
Exponent: The power to which the base is raised to in order to accurately represent the
number.
Base: The number of values the number systems allows a digit to take. 2 in the case of floating-
point representation.
The floating point representation stores a value for the mantissa and a value for the exponent.
A defined number of bits are used for what is called the significant/mantissa, +-M. Remaining
bits are for the exponent, E. The radix, R is not stored in the representation as it has an implied
value of 2(representing 0 and 1's). If a real number was stored using 8 bits: four bits for the
mantissa and four bits for the exponent with each using two complement representation. The
MUHAMMAD WASEEM SABRI
exponent is stored as a signed integer. The mantissa has to be stored as a fixed point real value.
The binary point can be in the beginning after the first bit(immediately after the sign bit) or
before the last bit. The former produces smaller spacing between the values that can be
represented and is more preferred. It also has a greater range than the fixed representation.
Converting a denary value expressed as a real number into a floating point binary
fractional parts represent a half, a quarter, an eighth…(even). Other than .5 no other values
unless the ones above can be converted accurately. So you convert by multiplying by two and
For example: 8.63, 0.63 * 2 = 1.26 therefore .1 -> 0.26 * 2 = 0.52 and .10 -> 0.52 * 2 = 1.04 and
.101 and you keep going until the required amount of bits are achieved.
3. Convert the fractional part. You start by combining the two parts which gives the exponent
value of zero. Shift the binary points by shifting the decimal to the beginning giving a higher
exponent value. Depending on the number of bits, add extra 0's at the end of the mantissa and
4. Adjust the position of the binary point and change the exponent accordingly to achieve a
normalized form.
Therefore: 8.75 -> 1000 -> 01000 -> .11 -> 010000.11 -> 0.100011(mantissa) -> 0100011000
When implementing the floating point representation, a decision has to be made regarding the
total number of bits to be used and how many for the mantissa and exponent.
MUHAMMAD WASEEM SABRI
Usually, the choice for the total number of bits will be provided as an option when the program
is written, however, the split between the two parts will have been determined by the floating
point processor.
If there were a choice, it's convenient to note that increasing the number of bits for the
mantissa would give better precision but would leave fewer bits for the exponent thus reducing
the range of possible values and vice versa. For maximum precision, it is necessary to normalize
Optimum precision will only be made once full use is made of the bits in the mantissa therefore
using the largest possible magnitude for the value represented by the mantissa.
Also, the two most significant bits must be different. 0 1 for positives and 10 for negatives.
-they both equal 2 but the most precise is the second one with the, higher bits in the mantissa.
-For negatives.
When the number is represented with the highest magnitude for the mantissa, the two most
significant bits are different thus that a number is in a normalized representation. How a
MUHAMMAD WASEEM SABRI
number could be normalized: for a positive number, the bits in the mantissa are shifted left
until the most significant bits are 0 followed by 1. For each shift left the value of the exponent is
reduced by 1. The same process of shifting is used for a negative number until the most
significant bits are 1 followed by 0. In this case, no attention is paid to the fact that bits are
falling off the most significant end of the mantissa. Thus normalization is shifting bits to the left
1. The conversion of real denary values to binary mostly needs a degree of approximation
followed by the restriction of the number of bits used to store the mantissa. These rounding
errors can become significant after multiple calculations. The only way of preventing a serious
problem is to increase the precision by using more bits for the mantissa. Programming
2. The highest value represented is 112 thus a limited range. This produces an overflow condition.
If there is a result value smaller than one that can be stored, there would be an underflow error
condition. This very small number can be turned into zero but there are several risks like
eg: One use of floating point numbers are in extended mathematical procedures involving
repeated calculations like weather forecasting which uses the mathematical model of the
atmosphere.