0% found this document useful (0 votes)
34 views11 pages

Data Representation

The document discusses user-defined data types in programming, including composite types like records and sets, and non-composite types like enumerated and pointer data types. It also covers file organization and access methods, including serial, sequential, and direct access files, along with considerations for floating-point representation of real numbers and the challenges associated with it. The importance of normalization and precision in floating-point representation is emphasized, highlighting potential issues such as rounding errors and overflow/underflow conditions.

Uploaded by

waseem sabri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views11 pages

Data Representation

The document discusses user-defined data types in programming, including composite types like records and sets, and non-composite types like enumerated and pointer data types. It also covers file organization and access methods, including serial, sequential, and direct access files, along with considerations for floating-point representation of real numbers and the challenges associated with it. The importance of normalization and precision in floating-point representation is emphasized, highlighting potential issues such as rounding errors and overflow/underflow conditions.

Uploaded by

waseem sabri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

MUHAMMAD WASEEM SABRI

Data representation

User-defined data types

 When object-oriented programming is not being used, a programmer may choose not to use

any user-defined data types. However, for a large program, their use will make a program less

error-prone and more understandable. It also has less restriction and allows for inevitable user

definition. The use of built in data types are the same for any program. However, there can't be

a built-in record type because each different problem will need an individual definition of a

record.

i. Composite data types:

Composite user-defined data types have a definition with a reference to at least one other type.

 ==Record Data type:== a data type that contains a fixed number of components that can be of

different types. it allows the programmer to collect together values with different data types

when these form a coherent whole. it could be used for the implementation of a data structure

where one or more of the variables defined are pointer variables.

 TYPE

 <main identifier>

 DECLARE <subidentifier1> : <built in data type>

 DECLARE <subidentifier2> : <built in data type>

 ENDTYPE


MUHAMMAD WASEEM SABRI

 <main identifier>.<sub identifier(x)> ← <value>

 ==Set Data type:== allows a program to create sets and to apply the mathematical operations

defined in set theory. Operations like:

• Union

• Difference

• Intersection

• Include an element in the set

• Exclude an element from the set

• Check whether an element is in a set

 ==Objects and Classes:== in object-oriented programming, a program defines the classes to be

used-they're all user-defined data types. Then for each class, the objects must be defined.

ii. Non-Composite data types:

Non-composite user-defined data types don’t involve a reference to another type. When a

programmer uses a simple built-in type the only requirement is for an identifier to be named

with a defined type. They have to be explicitly defined before an identifier can be created-

unlike built-in data types which include string, integer, real…

 ==Enumerated Data type:== a list of possible data values. The values defined here have an

implied order of values to allow comparisons to be made. Therefore value2 is greater than
MUHAMMAD WASEEM SABRI

value1(they're not string values and can't be quoted). This allows for comparisons to be made.

It is also countable thus finite values.

 TYPE

 <Datatype> = (<value1>,<value2>,<value3>…)

 ENDTYPE

 DECLARE <identifier> : <datatype>

 ==Pointer Data type:== used to reframe a memory location. it may be used to construct

dynamically varying data structures. The pointer definition has to relate to the type of the

variable that is being pointed to(doesn’t hold a value but a reference/address to data).

 TYPE

 <Datatype> = ^<type name>

 ENDTYPE

 DECLARE <identifier> : <datatype>

 <assignment value> ← <identifier>^

Special use of a pointer variable is to access the value stored at the address pointed to. The

pointer variable is said to be dereferenced.

Advertisement

File organization and access


MUHAMMAD WASEEM SABRI

Contents, in a file of any type, is stored using a defined binary code that allows the file to be

used in the way intended. But, for storing data to be used by a computer program, there are

only 2 defined file types, a text file or a binary file.

 A text file contains data stored according to a defined character code defined by ASCII or

Unicode. A text file can be created using a text editor.

 A binary file is a file designed for storing data to be used by a computer program(0's and 1's). It

stores data in its internal representation(an integer value might be stored in 2 bytes in 2's

complement representation to represent a negative number) and this file is created using a

specific program. Its organization is based on records (a collection of fields containing data

values). file → records → fields → values

Methods of file organization

 ==Serial files:== contains records that have no defined order. A text file may be a serial file

where the file has repeating lines which are defined by an end of line character(s). There's no

end of record character. A record in a serial file must have a defined format to allow data to be

input and output correctly. To access a specific record, it has to go through every record until

found.

File access: Successively read record by record until the data required is found thus very

slow. Uses:

 Batch processing

 Backing up data on magnetic tape

 Banks record transactions involving customer accounts every time there is a transaction
MUHAMMAD WASEEM SABRI

 ==Sequential files:== has records that are ordered and is suited for long term storage of data

and thus is considered an alternative to a database. A key field is required for a sequential file

to be ordered for which the values are unique and sequential. This way it can be easily

accessed. A sequential database file is more efficient than a text file due to data integrity,

privacy and less data redundancy. A change in one file would update any other files affected.

Primary keys from the DBMS(database management system) need to be unique but not

ordered unlike the key field from the sequential files which need to be ordered and unique. A

particular record is found by sequentially reading the value of the key field until the required

value is found.

File access:

Successively read the value In the key field until the required key is found.

To edit/delete data:

Create a new version of the file. Data is copied from the old file to the new file until the record

is reached which needs editing or deleting. For deleting, reading and copying of the old file

continue from the next record. If a record has been edited, the new version is written to the

new file and the remaining records are copied to the new file.

 ==Direct access/random access files:== access isn't defined by a sequential reading of the

file(random). It's well suited for larger files as it takes longer to access sequentially. Data in

direct access files are stored in an identifiable record which could be found by involving initial

direct access to a nearby record followed by a limited serial search. The choice of the position

chosen must be calculated using data in the record so the same calculation can be carried out
MUHAMMAD WASEEM SABRI

when subsequently there's a search for the data. One method is the hashing algorithm which

takes the key field as an input and outputs a value for the position of the record relative to the

start of the file. To access, the key is hashed to a specific location. This algorithm also takes into

account the potential maximum length of the file which is the number of records the file will

store.

 eg: If the key field is numeric, divide by a suitable large number and use the remainder to find a

position. But we won't have unique positions. If a hash position is calculated that duplicates one

already calculated by a different key, the next position in the file is used. this is why a search will

involve direct access possibly followed by a limited serial search. That's why it's considered

partly sequential and partly serial.

File access:

The value in the key field is submitted to the hashing algorithm which then provides the same

value for the position in the file that was provided when the algorithm was used at the time of

data input. It goes to that hashed position and through another short linear search because of

collisions in the hashed positions. Fastest access.

To edit/delete data:

Only create a new file if the current file is full. A deleted record can have a flag set so that in a

subsequent reading process the record is skipped over. This allows it to be overwritten.

Uses:

Most suited for when a program needs a file in which individual data items might be read,

updated or deleted.
MUHAMMAD WASEEM SABRI

Factors that determine the file organization to use:

 How often do transactions take place, how often does one need to add data?

 How often does it need to be accessed, edited, or deleted?

Real numbers and normalized floating-point

representation

 Real number: A number that contains a fractional part.

 Floating-point representation: The approximate representation of a real number using binary

digits.

 Format: Number = ±Mantissa × BaseExponent

 Mantissa: The non-zero part of the number.

 Exponent: The power to which the base is raised to in order to accurately represent the

number.

 Base: The number of values the number systems allows a digit to take. 2 in the case of floating-

point representation.

The floating point representation stores a value for the mantissa and a value for the exponent.

A defined number of bits are used for what is called the significant/mantissa, +-M. Remaining

bits are for the exponent, E. The radix, R is not stored in the representation as it has an implied

value of 2(representing 0 and 1's). If a real number was stored using 8 bits: four bits for the

mantissa and four bits for the exponent with each using two complement representation. The
MUHAMMAD WASEEM SABRI

exponent is stored as a signed integer. The mantissa has to be stored as a fixed point real value.

The binary point can be in the beginning after the first bit(immediately after the sign bit) or

before the last bit. The former produces smaller spacing between the values that can be

represented and is more preferred. It also has a greater range than the fixed representation.

Converting a denary value expressed as a real number into a floating point binary

representation: Most fractional parts do not convert to a precise representation as binary


MUHAMMAD WASEEM SABRI

fractional parts represent a half, a quarter, an eighth…(even). Other than .5 no other values

unless the ones above can be converted accurately. So you convert by multiplying by two and

recording the whole number part.

For example: 8.63, 0.63 * 2 = 1.26 therefore .1 -> 0.26 * 2 = 0.52 and .10 -> 0.52 * 2 = 1.04 and

.101 and you keep going until the required amount of bits are achieved.

The method for converting a positive value is:

1. Convert the whole number part

2. Add the sign bit 0

3. Convert the fractional part. You start by combining the two parts which gives the exponent

value of zero. Shift the binary points by shifting the decimal to the beginning giving a higher

exponent value. Depending on the number of bits, add extra 0's at the end of the mantissa and

beginning of the exponent.

4. Adjust the position of the binary point and change the exponent accordingly to achieve a

normalized form.

Therefore: 8.75 -> 1000 -> 01000 -> .11 -> 010000.11 -> 0.100011(mantissa) -> 0100011000

0100(10 for M, and 4 for E).

 For negatives, use 2's complement.

 When implementing the floating point representation, a decision has to be made regarding the

total number of bits to be used and how many for the mantissa and exponent.
MUHAMMAD WASEEM SABRI

 Usually, the choice for the total number of bits will be provided as an option when the program

is written, however, the split between the two parts will have been determined by the floating

point processor.

 If there were a choice, it's convenient to note that increasing the number of bits for the

mantissa would give better precision but would leave fewer bits for the exponent thus reducing

the range of possible values and vice versa. For maximum precision, it is necessary to normalize

a floating point number.

 Optimum precision will only be made once full use is made of the bits in the mantissa therefore

using the largest possible magnitude for the value represented by the mantissa.

 Also, the two most significant bits must be different. 0 1 for positives and 10 for negatives.

 -they both equal 2 but the most precise is the second one with the, higher bits in the mantissa.

 0.125 * 2^4 = 2 0 001 0100

 0.5 * 2^2 = 2 0 100 0010

-For negatives.

 0.25 * 2^4 = -4 1 110 0100

 1.0 * 2^2 = -4 1 000 0010

When the number is represented with the highest magnitude for the mantissa, the two most

significant bits are different thus that a number is in a normalized representation. How a
MUHAMMAD WASEEM SABRI

number could be normalized: for a positive number, the bits in the mantissa are shifted left

until the most significant bits are 0 followed by 1. For each shift left the value of the exponent is

reduced by 1. The same process of shifting is used for a negative number until the most

significant bits are 1 followed by 0. In this case, no attention is paid to the fact that bits are

falling off the most significant end of the mantissa. Thus normalization is shifting bits to the left

until the 2 most significant bits are different.

Problems with using floating point numbers:

1. The conversion of real denary values to binary mostly needs a degree of approximation

followed by the restriction of the number of bits used to store the mantissa. These rounding

errors can become significant after multiple calculations. The only way of preventing a serious

problem is to increase the precision by using more bits for the mantissa. Programming

languages therefore offer options to work in double/quadruple precision.

2. The highest value represented is 112 thus a limited range. This produces an overflow condition.

If there is a result value smaller than one that can be stored, there would be an underflow error

condition. This very small number can be turned into zero but there are several risks like

multiplication or division of this value.

eg: One use of floating point numbers are in extended mathematical procedures involving

repeated calculations like weather forecasting which uses the mathematical model of the

atmosphere.

You might also like