0% found this document useful (0 votes)
99 views60 pages

Unit-Ii Part Ii Data Types: Click To Add Text

This document discusses different data types including primitive data types, character string types, user-defined ordinal types, and array types. It covers topics such as enumeration types, subrange types, string descriptors, array indexing syntax, subscript types, range checking, and binding and allocation of static versus dynamic arrays.

Uploaded by

Imran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views60 pages

Unit-Ii Part Ii Data Types: Click To Add Text

This document discusses different data types including primitive data types, character string types, user-defined ordinal types, and array types. It covers topics such as enumeration types, subrange types, string descriptors, array indexing syntax, subscript types, range checking, and binding and allocation of static versus dynamic arrays.

Uploaded by

Imran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 60

UNIT-II Part II

Data Types
Click to add Text
Unit-2 Topics
Introduction
Primitive data types
Character string types
User-defined ordinal types
Array types
Associative arrays
Record types
Union types
Pointer and reference types
Type Checking
Strong Typing
Type Equivalence
2
Introduction
A data type defines a collection of data objects and
a set of predefined operations on those objects
Evolution of data types:
FORTRAN I (1957)
 Just types for INTEGER, REAL, arrays
Ada (1983)
 Programmer able to create a user-defined type for every
category of variables in the problem space and have the
system enforce the types
A descriptor is the collection of the attributes of a
variable

3
Primitive data types
Primitive types are not defined in terms of other types
Integer
Almost always an exact reflection of the hardware
There may be as many as eight different integer types in a
language
Floating Point
Model real numbers, but only as approximations
Issues are precision and range
Languages for scientific use support at least two floating-
point types, sometimes more
Usually exactly like the hardware, but not always
4
Primitive data types
Decimal
For business applications (dollars and cents)
Store a fixed number of decimal digits (BCD)
Advantage is accuracy
Disadvantage: Limited range and wastes memory
Boolean
Could be implemented as bits, but typically one byte
per Boolean
Advantage is readability
Character
Stored as numeric codings (e.g., ASCII, Unicode) 5
Character string types
Values are sequences of characters
Design issue
Is the string type primitive or an array of characters?
 It is not costly and more convenient to have the string type be primitive
 Pascal, C, and C++ strings are arrays of characters
 Fortran 90, Ada, and Basic are closer to primitive with intrinsic string
operations
Typical operations
 Assignment
 Comparison (=, >, etc.)
 Catenation
 Substring reference
 Pattern matching

6
Character string types
Design issue
Should strings have static or dynamic length?
 Ada has the following string types
• String static
• Bounded_String limited dynamic up to a maximum length
• Unbounded_String unlimited dynamic length
 It is common for a language to have the first two
Implementation issues
 Static length only requires a compile-time descriptor
 Limited dynamic length may need a run-time descriptor for
length
• Instead, C and C++ terminate strings with the null char
 Dynamic length needs a run-time descriptor
• Allocation / deallocation is the biggest implementation problem
7
Character string descriptors

Compile-time descriptor
for static strings Run-time descriptor for
limited dynamic strings

8
Ordinal types
An ordinal is one in which the range of possible
values can be easily associated with a subset of
positive integers
Examples of typical predefined ordinal types
Integer
Character
Boolean
We will consider the following user-defined ordinal
types
Enumeration type
Subrange type
9
Enumeration type
An enumeration type is one in which the user
enumerates all of the possible values
Values are symbolic constants (identifiers)
Example (Ada)

type Days is (Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday);

for today in Tuesday .. Thursday loop



end loop;

10
Enumeration type
Design Issues
What operations are allowed for enumeration types
 Ada has attribute operations
• Days‘First gives the first day
• Days‘Last gives the last day
• Days‘Pos( today ) gives the Integer position in the enum list
• Days‘Val( 3 ) gives the enum value associated with position 3
• Days‘Pred( today ) gives the predecessor of today
• Days‘Succ( today ) gives the successor of today
Should comparison operations =, <, <=, etc. be allowed?
Should a symbolic constant be allowed to be in more than
one type definition (overloading)?
Is coercion performed to or from enumeration values?
11
Enumeration choices
Pascal
Cannot overload enumeration constants
Enums can be used for array subscripts and case selectors
Enums can be compared
No operations for input or output
C and C++
Can be used like Pascal, but . . .
Coerced, as in “today++” or as in “int n = today”
Operations for input and output as integers
Ada
Can be used as in Pascal, but . . .
Enums may be overloaded
 Context must make use clear or use special notation
No coercion and allowed ranges are checked
Operations exist for input and output of enumeration values in text form
C#
No coercion and allowed ranges are checked
12
Enumeration type
Evaluation
Aid to readability
 Names are easily recognized whereas coded values are not
 E.g. – no need to code a color as a number
Aid to reliability
 Compiler can check
• Operations on enums
– E.g. – don’t allow colors to be added
• Ranges of allowed values
– E.g. – Ada detects the error in day := Days’Succ( Saturday )

Implementation
Enumeration types are implemented as integers
13
Subrange type
The subrange type is an ordered contiguous
subsequence of an ordinal type
Examples (Ada)
subtype Positive is Integer range 1 .. Integer'Last;
subtype Natural is Integer range 0 .. Integer'Last;
subtype Index is Integer range -100 .. 100;

for next in Index loop



end loop;

type Days is (Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday);


subtype Weekdays is Days range Monday .. Friday;

for today in Weekdays loop



end loop;
14
Subrange type
Evaluation
Aid to readability
 E.g. – Can distinguish between a weekday and a day
Reliability
 Restricted ranges aid error detection
 E.g. – Saturday is not a valid weekday
Implementation
Subrange types are just the parent types with check
code (inserted by the compiler) to restrict assignments
to subrange values
15
Arrays
An array is an aggregate of indexed data elements of the
same type
Two types involved
 Element type
 Index type
Each individual element is identified by an index to its position in
the aggregate
Design Issues
What types are legal for subscripts?
Are subscripting expressions in element references range
checked?
When does binding occur for subscript ranges?
When does allocation take place?
What is the maximum number of subscripts?
Can array objects be initialized?
Are any kind of slices allowed?
16
Arrays
Index syntax
FORTRAN, PL/I, Ada use parentheses
 Ada intentionally uses parentheses to make an array
reference look like a function call
n := a( 23 );
Most other languages use brackets [ ]
Indexing is a storage mapping from the array
indices to elements
This mapping requires a run-time calculation to
reference memory

17
Array storage mapping example
Storage mapping for 2-dim array b
Row-wise allocation is used
Access code for access b[ i, j ] requires j=
2 adds and 2 multiplies 0 1 2 3 4

w is the size of each cell in bytes 0

loc( b[ i, j ] ) 2

= loc( b ) 3
+ w * ( (# elements in previous rows)
+ (# previous elements in row i) ) i= 4
x
= loc( b ) + w*( i * ( # columns) + j ) 5

= loc( b ) + 4*( 5*i + j )


6

18
Arrays
Subscript types
FORTRAN, C, and Java
 Integer only
Pascal and Ada
 Any ordinal type
• Integer, Boolean, Character, enum

Range checking
Java, ML, C# check the range of all subscripts
C, C++, Perl, Fortran do not
Ada checks by default but this can be disabled by a
compiler Pragma
19
Array binding and allocation
We consider the following categories of arrays
Static array
Fixed stack-dynamic array
Stack-dynamic array
Fixed heap-dynamic array
Heap-dynamic array
These are based on when the subscript ranges are
bound and when storage is allocated

20
Array binding and allocation
Static arrays
Range of subscripts and storage bindings are static
 e.g. FORTRAN 77, some arrays in Ada, C/C++ static arrays
Advantage
 Execution efficiency
 No run-time overhead for allocation or deallocation
Fixed stack-dynamic arrays
The range of subscripts is statically bound
Storage is bound at elaboration time
 e.g. – most local variable arrays
Advantage: space efficiency descriptor

21
Array binding and allocation
Stack-dynamic arrays
The index range and storage allocation are dynamic,
but fixed from then on for the variable’s lifetime
Advantage: flexibility
 Size need not be known until the array is about to
be used
n := <expression>;
E.g. – Ada declare blocks declare
a : array (1..n) of Float;
begin

end;

22
Array binding and allocation
Fixed heap-dyamic arrays
Like stack-dynamic arrays except . . .
 Storage allocated on the heap
 The index range and storage allocation is initiated by program
request rather than subprogram elaboration
E.g. – all Java arrays
Heap-dynamic arrays
The subscript range and storage bindings are dynamic and may
subsequently be changed
Supported by Smalltalk (e.g. – OrderedCollection), APL,
Pearl, JavaScript, FORTRAN 90, and C# ArrayList class
23
Arrays
Number of subscripts
FORTRAN I allowed up to three
FORTRAN 77 allows up to seven
Other languages - no limit
Array initialization
Some languages permit initialization of arrays
Fortran C
Integer List( 3 )
int list [ ] = { 21, 67, 9 }
Data List / 21, 67, 9 /

Ada “aggregates”
list : array( 1 .. 3 ) of Integer := ( 21, 67, 9 );
list : array( 1 .. 100 ) of Integer := ( 10 => 21, 20 => 67, 30 => 9, others => 0 );
list : array( 1..10, 1..3 ) of Integer := (1 => (1,2,3), 10 => (4,5,6), others => (0, 0,0));
24
Array operations
An array operation operates on an array or a part of
an array as a unit
Ada operations
Assignment
Catenation (1-dim only)
Equality (=) and inequality (/=)
APL
Most powerful array-processing language ever devised
Many array operations

25
Slices
A slice is some substructure of an array
It is nothing more than a referencing mechanism
Slices are only useful in languages that have array
operations
Fortran slices at right
Ada slices below
a : array (1..100) of Float;
a( 1..50 ) := a( 51..100);

26
Associative arrays
An associative array is an unordered collection of data
elements that are indexed by an equal number of values
called keys
Also called a . . .
Map
Key-value table
Dictionary
Perl example
An associative array is called a hash in Perl
Names begin with %
Aggregate literals are delimited by parentheses
 E.g. – %temps = ("Monday" => 77,"Tuesday" => 79,…);
Subscripting is done using braces and keys
 E.g. – %temps{ "Wednesday“ } = 83;
Elements can be removed with delete
 E.g. – delete %temps{ "Tuesday“ };
27
Records
A record is a aggregate of
named data elements of
possibly diverse types
A compile-time descriptor for a
record is at right
The offset is from the record base
address
Design Issues
What is the form of references? a compile-time descriptor
for a record

What unit operations are defined?

28
Records
Called the struct data type in C, C++, and C#
A class defines a record in Java and Smalltalk
Record declarations
COBOL uses level numbers to show nested records
Other languages use a recursive definition
Field references
COBOL
 <fieldName> OF <recordName2> OF<recordName1>
Other languages use dot notation
 <recordName1>.<recordName2>.<fieldName>

29
Records
Fully qualified field references must include all
nested record names
Elliptical references allow leaving out record
names as long as the reference is unambiguous
Pascal provides a with clause to abbreviate
references

30
Record Operations
Assignment
Allowed in Pascal, Ada, and C if the types are identical
In Ada, the RHS can be an record aggregate constant
COBOL uses “MOVE CORRESPONDING”
 Moves all fields in the source record to fields with the same
names in the destination record
Initialization
Allowed in Ada, using an aggregate constant
In Java, done by the constructor
Comparison
Ada has tests for equality = and /=
31
Arrays vrs. records
Access to array elements is much slower than
access to record fields
Each record field is accessed with a fixed offset from
the record base address
Array subscripts require run-time calculation

32
Union types
A union is a type whose variables are allowed to store
different type values at different times during execution
Design issue for unions
How should type checking be done?
Examples
Fortran has EQUIVALENCE
 No type checking
C and C++ have free unions
 Not part of structs
 Complete freedom from type checking
Pascal embeds unions in records
 Design leads to ineffective type checking

33
Discriminated unions
Algol 68 and Ada use discriminated unions
This provides secure type checking
Ada
Ada embeds discriminated unions in records
One record field in called a discriminant or tag
The discriminant on in the example on the following
slide is Form

34
Ada example type Shape is ( Circle, Triangle, Rectangle );
type Colors is ( Red, Green,Blue );
The discriminant field type Figure( Form : Shape ) is record
Form may not be Filled : Boolean;
Color : Colors;
changed in isolation case Form is
when Circle =>
It may only be Diameter : Float;
when Triangle =>
changed by assigning LeftSide : Integer;
to the entire record RightSide : Integer;
Angle : Float;
This prevents the when Rectangle =>
Height : Integer;
record fields from Width : Integer;
becoming end case;
end record;
inconsistent
35
Ada example
Assignment using a record aggregate
Fig : Figure;
Fig := ( Filled => true, Color => Blue, Form => Rectangle, Height => 12, Width => 3 );

Layout of record fields


Fields Diameter, LeftSide, RightSide, Angle, Height
and Width share the same bytes

36
Pointer types
Pointer type values consist of memory addresses
and the special value nil (or null)
Pointers are used for
Indirect addressing
Management of heap-dynamic variables
 These are anonymous variables

37
Pointer operations
Assignment operation
Sets a pointer to a useful address
Dereferencing operation
Interprets the pointer variable as representing the
object at the memory address contained in the pointer
variable
Thus, it applies one level of indirect addressing
Deallocation
Returns the heap-dynamic storage referred to by a
pointer to the system for reallocation
38
Problems with pointers
Dangling pointers
A dangling pointer refers to a heap-dynamic variable that
has been deallocated
To create a dangling pointer in Pascal with explicit
deallocation . . .
 Allocate a heap-dynamic variable pointed to by p
 Make an alias for the pointer: q := p
 Explicitly deallocate the heap-dynamic variable: dispose( p );
 Now q contains a dangling pointer

39
Problems with pointers
Lost heap-dynamic variables
A lost heap-dynamic variable is no longer referenced
by any program pointer and is inaccessible
To create a lost heap-dynamic variable . . .
 Allocate a heap-dynamic variable pointed to by p
 Replace the pointer in p by a reference to some other heap-
dynamic variable: p := q
 Now the first heap-dynamic variable is inaccessible
The process of losing heap-dynamic variables is called
memory leakage

40
Pointers in C and C++
Pointers in C and C++ are similar to addresses in
assembly language
Pointers may point virtually anywhere in memory
Pointer arithmetic is possible
Programmer is responsible for avoiding problems
of dangling pointers and lost heap-dynamic
variables

41
Pointers in C and C++
Dereferencing is explicitly specified with the * operator
Reference type variables are constant pointers specified
with the & operator
Reference pointers are always implicitly dereferenced
Used for parameter passing
 pass-by-reference

int count; /* defines count as an int variable */


int *ptr; /* defines ptr as a reference to an int variable */
int sum;
ptr = &sum; /* operator & produces the address of sum */
count = *ptr; /* operator * dereferences ptr and produces the value in sum */
ptr = ptr + 3 /* increments address in ptr by 12 */
int &ref = sum /* ref is constant pointer that creates an alias for sum */
ref = 23 /* assigns 23 to sum (implicitly dereferenced) */
42
Pointers in Ada
Called access types
Used only for heap-dynamic variables
No pointer arithmetic
All access variables are initialized to null
This also provides reliability
Heap-dynamic variables may (implementation option) be
implicitly deallocated at the end of the scope of a pointer type
Partially alleviates the problem of lost heap-dynamic variables
Has an explicit deallocator: Unchecked_Deallocation
Dangling pointer problem is possible

43
Pointers in Java
These are called reference types
Refer to heap-dynamic objects exclusively
No pointer arithmetic
All reference variables are initialized to null
No explicit deallocation
This prevents the dangling pointer problem
All objects are implicitly deallocated by garbage collection
Garbage collection prevents the lost heap-dynamic variable
problem
Reference variables are implicitly dereferenced whenever
the dot notation is used, as in p.link

44
Dangling pointer problem
The problem of dangling pointers can be resolved
using . . .
Tombstones
Locks and keys

45
Tombstones
Tombstone
An extra heap cell that
is a pointer to the
heap-dynamic variable
The actual pointer
variable points only at
a tombstone
When a heap-dynamic
variable deallocated,
the tombstone remains
but set to null
46
Locks and keys
The locks-and-keys technique represents pointer values
as a key-address pair
Each heap-dynamic variable is represented as storage for the
data plus a cell for the key
When heap-dynamic variable allocated, a lock value is
created and a copy is placed in both . . .
A lock cell within the heap-dynamic variable
The key cell of pointer
When a heap-dynamic variable is deallocated, its lock
value is cleared
Every dereference must compare the key value in the
pointer to the lock in the heap-dynamic variable
47
Heap management
Takes deallocation of heap-dynamic variables out
of the hands of programmers
Two popular solutions
Reference counters
 Incremental and done when inaccessible cells are created
Garbage collection
 Occurs when available heap space runs out

48
Reference counters
The reference counter solution maintains a counter in
every heap cell
The counter stores the number of pointers currently pointing at
the cell
Whenever a pointer is changed . . .
The counter in the old target is decremented
The counter in the new target is incremented
When a counter decrements to zero, the heap-dynamic
variable is returned to the list of available space
Disadvantages
Space required by the reference counters
Time overhead
Complications for cells in circular linked lists
49
Garbage collection
When heap storage is exhausted, perform garbage
collection as follows
Every heap cell has an extra bit used by the garbage
collection algorithm
All bits are initially cleared (assumed to be garbage)
Starting with all program pointers, recursively follow all
pointers and mark any heap-dynamic variable that can
be reached
All variables that remain unmarked are then returned to
the list of available heap cells

50
Garbage collection
Disadvantage
When you need it most, it works the worst
 You need it most when there is very little actual garbage left
in the heap
 The garbage collection algorithm is very time consuming in
this situation

51
Type checking
Type checking is the activity of ensuring that types are
compatible when considering . . .
the operands of an operator
the parameters and return type of a method
the two sides of an assignment statement
A compatible type is one that is either a legal type or one
that may be coerced to a legal type for the given situation
A coercion is an automatic type conversion that is allowed
under language rules and is implicitly performed by
compiler-generated code
A type error is the use of non-compatible type in a given
situation
52
Type checking
If all type bindings to variables are static, nearly all
type checking can be static
If type bindings are dynamic, type checking must
be dynamic
A programming language is strongly typed if type
errors are always detected
This definition from the text is not the standard
definition
 Under this Smalltalk would be strongly typed
The usual definition requires that the single type of
each variable name is known at compile time
53
Strong typing
Advantage
Allows the detection of type errors due to misuse of variables
Language examples:
FORTRAN 77 is not (parameters, EQUIVALENCE)
Pascal is not (only because of variant records)
C and C++ are not
 Parameter type checking can be avoided
 Unions are not type checked
Ada almost is (UNCHECKED_CONVERSION is loophole)
Java and C# are similar to Ada
 They allow explicit casts

54
Strong typing
Coercion rules strongly (and negatively) affect
strong typing
Fortran, C, and C++ are significantly less reliable than
Ada, in which all type conversion is explicit
Java is between C++ and Ada with about half the
assignment coercions of C++

55
Type equivalence
When are variables declared using user-defined types
compatible?
Name type eqivalence means that two variables have
equivalent types when they are declared in the same
declaration or in declarations that use the same typename
Easy to implement but highly restrictive
Ada example
type IndexType is 1..100;
count : Integer;
index : IndexType;
Variables count and index are not compatible
They don’t use the same type name
Assignments count := index; and index := count; are illegal
56
Type equivalence
Structure type equivalence means that two
variables have equivalent types if their types have
identical structures
More flexible, but harder to implement
The entire structures of both types must be compared
 Are two record (structure) types equivalent if they have the
same structure but different field names?
 Are two array types equivalent if the subscript ranges are
different?
It is not possible to distinguish between types with the
same structure which represent different kinds of data
 How can you avoid mixing counts of apples and oranges if
they are both integer types?
57
Ada examples
Ada usually requires name type equivalence but avoids
most restrictions by having derived types and subtypes
Derived types
A different type that has the same structure as a base type
Example of incompatible derived types
type Celsius is new Float;
type Fahrenheit is new Float;
Subtypes
A possibly range-constrained version of a base type
Example
subtype IndexType is Integer range 1..100;
count : Integer;
index : IndexType;
Variables count and index are now compatible
58
Ada examples
Ada uses structure type equivalence for “unconstrained
array” types
vec1 and vec2 are equivalent
type Vector is array( Integer range <>) of Float;
vec1 : Vector( 1..10 ):
vec2 : Vector( 11.. 20 );
Care must be taken with “constrained” anonymous types
A and B are incompatible
A : array( 1..10 ) of Integer;
B : array( 1..10 ) of Integer;
A and B are still incompatible
A, B : array( 1..10 ) of Integer;

Here, A and B are equivalent


type Array_Type is array( 1..10 ) of Integer;
A, B : Array_Type;

59
C and C++
C uses structure type equivalence for all types
except struct, enum, and union
Except if two structures or unions are defined in
different files
 Then structure type equivalence is again used
C++ uses name type equivalence
typedef in C and C++ simply creates an alias for a
type

60

You might also like