Chapter 4
Chapter 4
Type Systems
Introduction
• A type system is a logical system containing a set of rules that assigns a property
called a type to the various constructs of a computer program.
• Tells the compiler or interpreter how the programmer intends to use the data.
• Computer hardware can interpret bits in memory in several different ways: as
• instructions,
• Some Dynamically typed language (type is associated
• addresses,
with run-time values) is Untyped.
• characters, E.g javascript ( var point=100), Assembly language
• integer and
• floating-point numbers of various lengths.
• The bits are untyped (that do not make you define the type of a variable)
• A variable can hold a value of any data type
Cont’d
A type system consists of:
1. A mechanism to define types and associate them with certain language
constructs:
• The constructs that must have types are precisely that have values, or that can
refer to objects that have values. These includes
o named constants (const int x=5), What has a type?
things that have values
o variables,
o record fields, double a = 1.0;
Person p = new Student("John");
o parameters, sometimes subroutines;
o literal constants (e.g., 17 , 3.14 , "foo" ); and
o more complicated expressions containing these.
Cont’d
A type system consists of:
2. A set of rules for type equivalence, type compatibility, and type inference.
• Type equivalence rules determine when the types of two values are the
same.
Structural equivalence: two types are the same if they consist of the same components.
Characters :
• Traditional ASCII encoding : 1 byte
• recent languages (e.g.,Java and C#) use a two-byte representation designed to
accommodate (the commonly used portion of) the Unicode character set.
• Fortran 2003 supports four-byte Unicode characters.
Classification of Types
Cont’d
Numeric Types :
• A few languages (e.g., C and Fortran) distinguish between different lengths of
integers (-2.-1,0,1,2) and real numbers ( can include fractional component).
• Most do not, and leave the choice of precision /accuracy to the implementation.
• Differences in precision across language implementations lead to a lack of
portability.
• Java and C# providing several lengths of numeric types, with a specified
precision for each.
Classification of Types Cont’d
Enumeration Types :
• Enumerations were introduced by Wirth in the design of Pascal.
• An enumeration type consists of a set of named elements.
• The values of an enumeration type are ordered, so comparisons are generally valid
( mon < tue )
• There is usually a mechanism to determine the predecessor or successor of an
enumeration value. (in Pascal, tomorrow := succ(today)).
enum weekday {sun, mon, tue, wed, thu, fri, sat}; typedef int weekday;
const weekday sun = 0, mon = 1, tue = 2,
wed=3,thu=4,fri=5,sat=6;
Classification of Types Cont’d
Subrange Types :
• First introduced in Pascal, and are found in many subsequent languages.
• A subrange is a type whose values compose a contiguous subset of the values of
some discrete base type (also called the parent type).
• 12..14 a subrange of integer type
pascal
Ada
Records (structs) :
• Introduced by Cobol.
• A record consists of collection of fields which belongs to a (potentially different)
simpler type.
• A record type corresponds to the Cartesian product of the types of the fields.
Each field has its own type:
struct MyStruct {
boolean ok;
There is a way to access the field:
int bar; foo.bar;<- C, C++, Java style, F-logic path expressions
};
MyStruct foo; bar of foo<- Cobol/Algol style
Classification of Types Cont’d
• A language in which aliased types are considered distinct is said to have strict
name equivalence.
• A language in which aliased types are considered equivalent is said to have loose
name equivalence.
• Most Pascal-family languages use loose name equivalence
• Under strict name equivalence, a declaration type A=B is considered a definition.
• Under loose name equivalence it is merely a declaration; A shares the definition
of B.
Cont’d
Variants of Name Equivalence
Under strict name equivalence
• line 3 is both a declaration and a definition, and blink is a
new type, distinct from alink.
• p and q have the same type, because they both use type
definition on the right-hand side of line 4.
Under structural equivalence, all six of the variables shown have the same
type,
namely pointer to whatever cell is.
Type Conversion and Casts
Converting one type to another (casting) is required when:
• Types are structurally equivalent, but the language uses name equivalence.
o The conversion is only conceptual, not physical /no code will need to be executed at
run time.
• The types have different but intersecting sets of values (e.g., one is a subrange of the
other)
o Runtime check tests the validity of the conversion
• Types are physically different, but values of one type correspond to values of the other
e.g., all integers can be represented as reals
Non-converting cast
• Treat a variable of one type as another type, without changing the physical
representation
Type Compatibility
• Most languages do not require equivalence of types in every context.
• A value’s type must be compatible with that of the context in which it appears.
• In an assignment statement, the type of the right-hand side must be compatible
with that of the left-hand side.
• The types of the operands of + must both be compatible with some common type
that supports addition (integers, real numbers, or perhaps strings or sets).
• In a subroutine call, the types of any arguments passed into the subroutine must
be compatible with the types of the corresponding formal parameters
Type Compatibility Cont’d
• The definition of type compatibility varies greatly from language to language.
• Ada takes a relatively restrictive approach:type S is compatible with an type T if
and only if
S and T are equivalent,
one is a subtype of the other (or both are subtypes of the same base type), or
both are arrays, with the same numbers and types of elements in each
dimension.
• Pascal was only slightly more lenient:
➢ In addition to allowing the intermixing of base and subrange types, it
allowed an integer to be used in a context where a real was expected.
Coercion
• Automatic, implicit type conversion
• When an expression of one type is used in a context where a different type is
expected, one normally gets a type error
• Many languages allow things like this, and COERCE an expression to be of the
proper type
• Fortran has lots of coercion, all based on operand type
• C has lots of coercion, too, but with simpler rules:
all floats in expressions become doubles
short int and char become int in expressions
Coercion Cont’d
In effect, coercion rules are a relaxation of type checking
• Languages such as Modula-2 and Ada do not permit coercions
• C++, however, provides programmer-extensible coercion rules
They're one of the hardest parts of the language to understand
Type Inference
what determines the type of the overall expression?
• Answer : The result of an arithmetic operator usually has the same type as
the operands (possibly after coercing one of them, if their types were not
the same).
• The result of a comparison is usually Boolean.
• The result of a function call has the type declared in the function’s
header.
• The result of an assignment has the same type as the left-hand side.
Arrays
• The areas of memory of the same type.
• Arrays are the most common and important composite data types
• Unlike records, which group related fields of different types, arrays are usually
homogeneous
• Semantically, arrays can be thought of as a mapping from an index type to a
component or element type
• Usually the only operations permitted are selection of an element and
assignment, however
Fortran 90 offers many array operations supporting matrix algebra
Ada and Fortran 90 allow arrays to be compared for equality
Arrays Cont’d
Dimensions, Bounds, and Allocation
• global lifetime, static shape — If the shape of an array is known at compile
time, and if the array can exist throughout the execution of the program, then
the compiler can allocate space for the array in static global memory
• local lifetime, static shape — If the shape of the array is known at compile
time, but the array should not exist throughout the execution of the program,
then space can be allocated in the subroutine’s stack frame at run time.
• arbitrary lifetime, shape bound at elaboration time— In Java and C# an
array is a reference to an object, whose space is allocated on the heap.
Arrays Cont’d
Possible layouts of memory for Contiguous elements:
Row-major and Column-major:
storing multidimensional arrays in linear memory
Example: int A[2][3] = { {1, 2, 3}, {4, 5, 6} };
column major – Used in Fortran, MATLAB, GNU Octave,
R, Rasdaman, X10 and Scilab
column major : A is laid out contiguously in
linear memory as: 1 2 3 4 5 6
offset = row + column * NUMROWS
Example: A[1][1] (5)
offset = 1 + 1 * 2 = 3
Arrays Cont’d
o Row-major: A is laid out contiguously in linear memory as: 1 2 3 4 5 6
offset = row * NUMCOLS + column
Example: A[1][1] (5)
offset = 1 * 3 + 1 = 4