CHAPTER 6
DATA TYPES
Spring 2022
Introducti
on
2
A data type:
= a collection of data objects + a set of
predefined operations
The design issues
How to represent data objects?
What operations are defined and how are they
specified?
A descriptor is the collection of the
attributes of an object
IUBAT CSC 461 May 1, 2025
Descriptor
3 A descriptor is the collection of the attributes of a
variable. In
an implementation, a descriptor is an area of
memory that stores the attributes of a variable.
If the attributes are all static, descriptors are
required only at
compile time. These descriptors are built by the
compiler, usually as a part of the symbol table, and
are used during compilation.
For dynamic attributes, however, part or all of the
descriptor must be maintained during execution. In
this case, the descriptor is used by the run-time
system. IUBAT CSC 461 May 1, 2025
Primitive Data
Types
4
Primitive data types
Those not defined in terms of other data
types
Almost all programming languages provide
a set of primitive data types
Some primitive data types are merely
reflections of the hardware;
others require little non-hardware support
IUBAT CSC 461 May 1, 2025
Primitive Data Types:
5
Integer
Almost always an exact reflection of the hardware so the mapping is trivial
Example:
Java’s signed integer sizes:
byte(The byte data type is an 8-bit signed two's complement integer. It has
a minimum value of -128 and a maximum value of 127 (inclusive). ),
Short(The short data type is a 16-bit signed two's complement integer. It
has a minimum value of -32,768 and a maximum value of 32,767
(inclusive),
Int(By default, the int data type is a 32-bit signed two's complement
integer, which has a minimum value of -2 31 and a maximum value of 231-
1.),
Long( The long data type is a 64-bit two's complement integer. The signed
long has a minimum value of -263 and a maximum value of 263-1).
A signed integer value is represented in a computer by a string of bits, with
one of the bits (typically the leftmost) representing the sign. Most integer
types
are supported directly by the hardware.
IUBAT CSC 461 May 1, 2025
Primitive Data Types:
6
Floating Point
Model real numbers, but only as
approximations
Languages for scientific use support at
least two floating-point types
float and double
IEEE Floating-Point Standard
IUBAT CSC 461 May 1, 2025
Primitive Data Types:
7
Decimal
For business applications (money)
Essential to COBOL
C# offers a decimal data type
The decimal keyword indicates a 128-bit
data type. Compared to floating-point types,
the decimal type is appropriate for financial
and monetary calculations.
Evaluation:
Advantage: accurate computation in range
Comparing to floating point number
Disadvantages: limited range, wastes memory
IUBAT CSC 461 May 1, 2025
Decimal Example
8
C#
public class TestDecimal
{ static void Main()
{ decimal d = 9.1m;
int y = 3;
Console.WriteLine(d + y);
// Result converted to decimal
}
} // Output: 12.1
IUBAT CSC 461 May 1, 2025
Primitive Data Types:
9
Boolean
Simplest of all
Range of values: two elements
one for “true” and one for “false”
Could be implemented as bits, but often
as bytes
Advantage: readability
IUBAT CSC 461 May 1, 2025
Primitive Data Types:
10
Character
Stored as numeric codings
Most commonly used coding: ASCII
American Standard Code for Information Interchange
An alternative, 16-bit coding: Unicode
Includes characters from most natural
languages
Originally used in Java
C# and JavaScript also support Unicode
IUBAT CSC 461 May 1, 2025
Character String Types
11
A character string type is one in which the
values consist of sequences of characters.
Character string constants are used to label
output, and the input and output of all kinds of
data are often done in terms of strings. Of
course, character strings also are an essential
type for all programs that do character
manipulation.
Design issues:
Is it a primitive type or just a special kind of
array?
Should the length of strings be static or dynamic?
IUBAT CSC 461 May 1, 2025
Character String Types
12
Operations
Typical operations:
Assignment and copying
Comparison (=, >, etc.)
Catenation
Substring reference (A substring reference is a
reference to a substring of a given string.)
Pattern matching (Pattern matching is another
fundamental character string operation. In
some
languages, pattern matching is supported
directly in the language. In others, it is
provided by a function or class library.)
IUBAT CSC 461 May 1, 2025
Character String Type in Certain
Languages
13 C and C++
Not primitive
Use char arrays and a library of functions
strcpy(src, dest) may cause buffer overflow problem
Java
String class: primitive type
StringBuffer class: changeable/mutable
Which one is more efficient? …. A popular interview question
String str1 = new String(“hello,”); str1 += “the world”;
StringBuffer str1 = new StringBuffer (“hello,”);
str1.apend(“the world”);
Using a StringBuffer for concatenation can in fact produce code
that is significantly faster than using a String.
What is the difference from StringBuilder? This class is designed for
use as a drop-in replacement for StringBuffer in places where the
string buffer was being used by a single thread (as is generally the
case). Where possible, it is recommended that this class be used in
preference to StringBuffer as it will be faster under most
implementations.
IUBAT CSC 461 May 1, 2025
Character String Length
14
Options
Three options:
Static: (the length can be static and set when the
string is created. Such a string is called a static
length string.) COBOL, Java’s String class
Limited Dynamic Length: allow strings to have
varying length up to a declared and fixed maximum
set by the variable’s definition, as exemplified by
the strings in C and the C-style strings of C++.
These are called limited dynamic length strings.
In C-based language, a special character is used to
indicate the end of a string’s characters, rather than
maintaining the length
Dynamic (no maximum): SNOBOL4, Perl, JavaScript
Ada supports all three string length options
IUBAT CSC 461 May 1, 2025
Character String
15
Implementation
Static length: compile-time descriptor
Limited dynamic length: may need a run-time
descriptor for length (but not in C and C++)
Dynamic length: need run-time descriptor;
allocation/de-allocation is the biggest
implementation problem
Compile-time descriptor Run-time descriptor for
for static strings limited dynamic strings
IUBAT CSC 461 May 1, 2025
User-Defined Ordinal Types
16
An ordinal type is one in which the range of possible
values can be easily associated with the set of positive
integers
an ordinal data type is a data type with the property
that its values can be counted. That is, the values can
be put in a one-to-one correspondence with the
positive integers.
For example, characters are ordinal because we can
call 'A' the first character, 'B' the second, etc.
Examples: primitive ordinal types in Java
integer
char
Boolean
There are two user-defined ordinal types that have been supported by
programming languages: enumeration and
IUBAT subrange.
CSC 461 May 1, 2025
Enumeration
Types
17
All possible values, which are named constants,
are provided in the definition.
Enumeration types provide a way of defining
and grouping collections of named constants,
which are called enumeration constants.
C# example
enum days {mon, tue, wed, thu, fri, sat, sun};
No arithmetic operations are legal on enumeration
types
days d1, d2; d1 + d2 =?
No enumeration type variables are not implemented
into integer types
days d1; d1 = 4; ?
IUBAT CSC 461 May 1, 2025
Subrange
Types
18
An ordered contiguous subsequence of an
ordinal type
Example: 12..18 is a subrange of integer type
Ada’s design
type Days is (mon, tue, wed, thu, fri, sat, sun);
subtype Weekdays is Days range mon..fri;
subtype Index is Integer range 1..100;
Day1: Days;
Day2: Weekday;
Day2 := Day1;
IUBAT CSC 461 May 1, 2025
Array
Types
19
An array is an aggregate of homogeneous
data elements in which an individual
element is identified by its position in the
aggregate, relative to the first element.
Example: int a[100];
Design issues
What types are legal for subscripts?
When are subscript ranges bound?
When does allocation take place?
What is the maximum number of subscripts?
IUBAT CSC 461 May 1, 2025
Array
Indexing
20
Indexing or subscripting
Mapping from indices to elements
a(index_value) an element
Syntax:
FORTRAN, PL/I, Ada use parentheses, others use brackets
Index Types
Integer type only: Fortran, C, Java
Any ordinal type: Pascal
Integer or enum: Ada
Index range check
No: C, C++, Perl, Fortran
Yes: Java, ML, C#
IUBAT CSC 461 May 1, 2025
Array
Categories
21
Determined by the binding of the subscript
type to an array element
int a[?];
Static: subscript ranges are statically
bound and storage allocation is static
(before run-time)
Advantage: efficiency (no dynamic allocation)
func1()
{
static int a[100]; …
}
IUBAT CSC 461 May 1, 2025
Array
Categories
22
Fixed stack-dynamic:
Subscript ranges are statically bound, but
The allocation is done at declaration elaboration time during execution
Advantage: space efficiency
Stack-dynamic:
Subscript ranges are dynamically bound, and
The storage allocation is dynamic (done at run-time)
Advantage: flexibility (the size of an array need not be known until the array
is to be used)
Fixed heap-dynamic:
storage binding is dynamic but fixed after allocation (i.e., binding is done
when requested and storage is allocated from heap, not stack)
Heap-dynamic:
binding of subscript ranges and storage allocation is dynamic and can
change any number of times
Advantage: flexibility (arrays can grow or shrink during program execution)
IUBAT CSC 461 May 1, 2025
Array
Initialization
23
Some language allow initialization at the
time of storage allocation
C, C++, Java, C# example
int list [] = {4, 5, 7, 83}
Character strings in C and C++
char name [] = “freddie”;
Arrays of strings in C and C++
char *names [] = {“Bob”, “Jake”, “Joe”];
Java initialization of String objects
String[] names = {“Bob”, “Jake”, “Joe”};
IUBAT CSC 461 May 1, 2025
Record
Types
24
A record is a possibly heterogeneous
aggregate of data elements in which the
individual elements are identified by
names
Introduced by COBOL in 1960s
Syntax
COBOL 01 EMP-REC.
02 EMP-NAME.
05 FIRST PIC X(20).
05 MID PIC X(10).
05 LAST PIC X(20).
02 HOURLY-RATE PIC 99V99.
IUBAT CSC 461 May 1, 2025
Evaluati
on
25
Straight forward and safe design
Comparison to Arrays
Access to array elements is much slower
than access to record fields, because
subscripts are dynamic (field names are
static)
a[i] and b.field1
IUBAT CSC 461 May 1, 2025
Implementati
on
26
Offset address relative to
the beginning of the
records is associated with
each field
IUBAT CSC 461 May 1, 2025
Unions
Types
27
A union is a type
whose variables are allowed to store
different type values at different times
during execution
Design issues
Should type checking be required?
Should unions be embedded in records?
IUBAT CSC 461 May 1, 2025
Evaluation of
Unions
28
Potentially unsafe construct
Do not allow type checking
Java and C# do not support unions
Reflective of growing concerns for safety in
programming language
IUBAT CSC 461 May 1, 2025
Strongly Typed
Languages
29
A programming language is strongly
typed if type errors are always detected
How to perform type checking ?
If all type bindings are static, nearly all type
checking can be static
If type bindings are dynamic, type checking
must be dynamic
Type checking is the activity of ensuring that the
operands of an operator are of compatible types
IUBAT CSC 461 May 1, 2025
Pointer and Reference
30
Types
A pointer type variable has a range of values
that consists of memory addresses and a
special value, nil
One address value
An address tuple (segment, offset)
The use of pointers
Support indirect addressing
Manage dynamic memory
Access a location in heap
The variable that stores the address of
another variable is what in C++ is called
a pointer
IUBAT CSC 461 May 1, 2025
Pointer
Operations
31
Two fundamental operations
Assignment
set
a pointer variable’s value to some useful
address
Dereferencing
yields
the value stored at the location
represented by the pointer’s value
Dereferencing can be explicit
or implicit
C/C++ use an explicit operator “*”
j = * ptr
IUBAT CSC 461 May 1, 2025
Pointers in C and
C++
32
Extremely flexible but must be used with
care
Pointers can point at any variable
Pointer arithmetic is possible
Explicit dereferencing and address-of operators
Support dynamic storage management and addressing
IUBAT CSC 461 May 1, 2025