Principles of Programming Languages
Data Types
Definitions
Data type
collection of data objects a set of predefined operations
Descriptor : collection of attributes for a variable Object : instance of a user-defined (abstract data) type
Primitive Data Types
Primitive
not defined in terms of other data types defined in the language often reflect the hardware
Structured
built out of other types
Integer Types
Usually based on hardware May have several ranges
Javas signed integer sizes: byte, short, int, long C/C++ have unsigned versions of the same types Scripting languages often just have one integer type Python has an integer type and a long integer which can get as big as it needs to.
Representing Integers
Can convert positive integers to base 2 How do you handle negative numbers with only 0s and 1s?
Sign bit Ones complement Twos complement - this is the one that is used
Representing negative integers
Sign bit
Ones complement
Twos Complement
To get the binary representation, take the complement and add 1
Floating Point Types
Model real numbers
only an approximation due to round-off error
For scientific use support at least two floating-point types (e.g., float and double; sometimes more Usually based on hardware IEEE Floating-Point Standard 754
32 and 64 bit standards
18
Representing Real Numbers
We can convert the decimal number to base 2 just as we did for integers How do we represent the decimal point?
fixed number of bits for the whole and fractional parts severely limits the range of values we can represent
Use a representation similar to scientific notation
19
IEEE Floating Point Representation
Normalize the number
one bit before decimal point
Use one bit to represent the sign (1 for negative) Use a fixed number of bits for the exponent which is offset to allow for negative exponents
Exponent = exponent + offset
(-1)sign 1.Fraction x 2Exponent
110
Floating Point Types
C, C++ and Java have two floating point types
float double
Most scripting languages have one floating point type
Python's floating point type is equivalent to a C double
Some scripting languages only have one kind of number which is a floating point type
Fixed Point Types (Decimal)
For business applications (money) round-off errors are not acceptable
Essential to COBOL .NET languages have a decimal data type
Store a fixed number of decimal digits Operations generally have to be defined in software Advantage: accuracy Disadvantages: limited range, wastes memory
C# decimal Type
128-bit representation Range: 1.0x10-28 to 7.9x1028 Precision: representation is exact to 28 or 29 decimal places (depending on size of number)
no roundoff error
Other Primitive Data Types:
Boolean
Range of values: two elements, one for true and one for false Could be implemented as bits, but often as bytes
Character
Stored as numeric codings Most commonly used coding: ASCII An alternative, 16-bit coding: Unicode
Complex (Fortran, Scheme, Python) Rational (Scheme)
Character Strings
Values are sequences of characters Operations:
Assignment and copying Comparison (=, >, etc.) Catenation Substring reference Pattern matching
Design issues:
Is it a primitive type or just a special kind of array? Should the length of strings be static or dynamic?
Character String Implementations
C and C++
Not primitive Use char arrays and a library of functions that provide operations
SNOBOL4 (a string manipulation language)
Primitive Many operations, including elaborate pattern matching
Java
String class
String Length Options
Static: COBOL, Javas String class Limited Dynamic Length: C and C++
a special character is used to indicate the end of a strings characters
Dynamic (no maximum): SNOBOL4, Perl, JavaScript Ada supports all three string length options
String Implementation
Static length: compile-time descriptor Limited dynamic length: may need run-time descriptor not in C and C++ Dynamic length: needs run-time descriptor;
allocation/deallocation is main implementation issue
User-Defined Ordinal Types
ordinal type : range of possible values corresponds to set of positive integers Primitive ordinal types
integer char boolean
User-defined ordinal types
enumeration types subrange types
Enumeration Types
All possible values, which are named constants, are provided in the definition C example
enum days {mon, tue, wed, thu, fri, sat, sun};
Design issues
duplication of names coercion rules
enums in C (and C++)
To define an enumerated type in C
enum weekday {Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday}; enum weekday today = Tuesday;
Use typedef to give the type a name
typedef enum weekday {Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday} weekday; weekday today = Tuesday;
By default, values are consecutive starting from 0.
You can explicitly assign values Enum months {January=1, February, };
Enumerations in Java 1.5
An enum is a new class which extends java.lang.Enum and implements Comparable
Get type safety and compile-time checking Implicitly public, static and final Can use either == or equals to compare toString and valueOf are overridden to make input and output easier
Java enum Example
Defining an enum type
enum Season {WINTER, SPRING, SUMMER, FALL};
Declaring an enum variable
Season season = Season.WINTER;
toString gives you the string representation of the name
System.out.println( season); // prints WINTER
valueOf lets you convert a String to an enum
Season = valueOf(SPRING);
Subrange Types
A contiguous subsequence of an ordinal type
Example: 12..18 is a subrange of integer type
Adas design
type Days is (mon, tue, wed, thu, fri, sat, sun); subtype Weekdays is Days range mon..fri; subtype Index is Integer range 1..100;
Day1: Days; Day2: Weekday; Day2 := Day1;
Implementation of User-Defined Ordinal Types
Enumeration types are implemented as integers Subrange types are implemented like the parent types
code inserted (by the compiler) to restrict assignments to subrange variables
Pointer and Reference Types
A pointer is a variable whose value is an address
range of values that consists of memory addresses plus a special value, nil
Provide the power of indirect addressing Provide a way to manage dynamic memory A pointer can be used to access a location in the area where storage is dynamically created (usually called a heap) Generally represented as a single number
Pointer Operations
Two fundamental operations: assignment and dereferencing Assignment is used to set a pointer variables value to some useful address Dereferencing yields the value stored at the location represented by the pointers value
Dereferencing can be explicit or implicit C++ uses an explicit operation via * j = *ptr sets j to the value located at ptr
Pointer Operations Illustrated
assignment ptr = &j
Dereferencing a pointer j = *ptr
allocation ptr = (int*)malloc( sizeof( int))
Copyright 2007 Addison-Wesley. All rights reserved.
128
Pointer Problems
Dangling pointers (dangerous)
A pointer points to a heap-dynamic variable that has been de-allocated
Garbage
An allocated heapdynamic variable that is no longer accessible to the user program
Pointers in C and C++
Extremely flexible but must be used with care Pointers can point at any variable regardless of when it was allocated Used for dynamic storage management and addressing Pointer arithmetic is possible Explicit dereferencing and address-of operators Domain type need not be fixed (void *) void * can point to any type and can be type checked (cannot be de-referenced)
Pointer Arithmetic in C and C++
float stuff[100]; float *p; p = stuff;
*(p+5) is equivalent to stuff[5] and p[5] *(p+i) is equivalent to stuff[i] and p[i]
Reference Types
C++ includes a special kind of pointer type called a reference type that is used primarily for formal parameters
Advantages of both pass-by-reference and pass-byvalue
Java extends C++s reference variables and allows them to replace pointers entirely
References refer to call instances
C# includes both the references of Java and the pointers of C++
Evaluation of Pointers
Dangling pointers and dangling objects are problems as is heap management Pointers are like goto's--they widen the range of cells that can be accessed by a variable Pointers or references are necessary for dynamic data structures--so we can't design a language without them
Structured Data Types
Built out of other types
usually composed of multiple elements. homogeneous : all elements have the same type heterogeneous : elements have different types
Structured Data Types
Arrays
aggregate of homogeneous data elements indexed by its position
Associative arrays
unordered collection of key-value pairs
Records
heterogeneous aggregate of data elements indexed by element name
Array Operations
Whole array operations:
assignment catenation
Elemental operations same as those of base type Indexing : mapping from indexes to elements
array_name (index_value_list) element an
Array Design Issues
What types are legal for subscripts? Are subscripting expressions in element references range checked? When are subscript ranges bound? When does allocation take place? What is the maximum number of subscripts? Can array objects be initialized? Are any kind of slices allowed?
Binding Time Choices
Static: compile-time binding of subscript range and memory Fixed stack-dynamic: subscript ranges static, allocated at declaration time (C, C++) Stack-dynamic: run-time binding of subscript range and memory Fixed heap-dynamic: storage binding is dynamic but fixed after allocation (Java, C and C++) Heap-dynamic: binding of subscript ranges and storage allocation is dynamic (Perl and JavaScript)
Array Initialization
Some language allow initialization at the time of storage allocation
C, C++, Java, C# example int list [] = {4, 5, 7, 83} Character strings in C and C++ char name [] = freddie; Arrays of strings in C and C++ char *names [] = {Bob, Jake, Joe};
Copyright 2007 Addison-Wesley. All rights reserved.
139
Memory for arrays
For 1D arrays, contiguous block of memory with equal amount of space for each element Two approaches for multi-dimensional arrays
Single block of contiguous memory for all elements
Arrays must be rectangular Address of array is starting memory location
Implement as arrays of arrays (Java)
Jagged arrays are possible Array variable is a pointer (reference)
Implementation of Arrays
Access function maps subscript expressions to an address in the array Access function for single-dimensioned arrays:
address(list[k]) = address (list[lower_bound]) + ((k-lower_bound) * element_size)
Two common ways to organize 2D arrays
Row major order (by rows) used in most languages Column major order (by columns) used in Fortran
Contiguous Array Memory
Row major (by rows) or column major order (by columns) for 2D array Access function maps subscript expressions to an address in the array
142
Row-major access formula
Location (a[I,j]) = address of a [row_lb,col_lb] + (((I - row_lb) * n) + (j - col_lb)) *element_size