Chapter 4 - Data Types
Chapter 4 - Data Types
Data types
1
Data Types
• A data type defines a collection of data values and a set of
predefined operations on those values .
• Data types can be
• Built-in data types
• User-defined data types
• User-defined types provide improved readability through
the use of meaningful names for types.
• They allow type checking of the variables of a special
category of use, which would otherwise not be possible.
2
Primitive Data Types
• Data types which are not defined in terms of other types.
• Many early programming languages had only numeric primitive types.
• Numeric types still play a central role among the collections of types supported by current
languages.
1. Numeric types
Integers
• Byte, short, int and long
• Signed and unsigned
Integer representation
• Sign-magnitude
• A negative integer could be stored in sign-magnitude notation, in which the
sign bit is set to indicate a negative and the remainder of the bit string
represents the absolute value of the number. 3
Primitive Data Types
• Sign-magnitude notation, however, does not lend itself to
computer arithmetic.
• Ones-complement
• In ones-complement notation, the negative of an integer is stored
as the logical complement of its absolute value
• Twos-complement
• In twos-compIement notation, the representation of a negative
integer is formed by taking the logical complement of the positive
version of the number and adding one.
4
Primitive Data Types
Example: Sign-magnitude
57 could be represented by 00111001
-57 could be represented by 10111001
• Example: Ones-complement
57 is represented by 00111001
-57 is represented by 11000110
• Example: Twos-complement
57 is represented by 00111001
-57 is represented by 11000110
+1
11000111
What is 1’s and 2’s complement representation of -7?
5
Primitive Data Types
Floating-points
Computer implementation of real numbers
Floating point numbers are commonly represented as
a sign, a significand, or fractional part, and an
exponent.
The IEEE has defined formats for 32 bit and 64 bit
floating point implementations
Most computers and computer languages support
these IEEE 754 implementations.
Single precision
Double precision
6
Decimal
Primitive Data Types
• Most larger computers that are designed to support business systems applications have
hardware support for decimal data types.
• Decimal types are stored using binary codes for the decimal digits.
• These representations are called binary coded decimal (BCD)
• In some cases, they are stored one digit per byte, but in others they are packed two digits
per byte
• Either way, they take more storage than binary representations.
• It takes at least four bits to code a decimal digit.
• Therefore, to store a six-digit coded decimal number requires 24 bits of memory.
• The main difference is Floats and Doubles are binary floating point types and a Decimal
will store the value as a floating decimal point type.
• So Decimals have much higher precision and are usually used within monetary (financial)
applications that require a high degree of accuracy.
7
Primitive Data Types
Boolean
• Boolean types are perhaps the simplest of all types; Their range
of values has only two elements: one for true and one for false
• Boolean types are often used to represent switches or flags in
programs.
• A Boolean value could be represented by a single bit, but
because a single bit of memory cannot be accessed efficiently on
many machines,
• They are often stored in the smallest efficiently addressable cell
of memory, typically a byte.
8
Primitive Data Types
Character types
• Character data are stored in computers as numeric codings
• Traditionally, the most commonly used coding was the 8-bit code
ASCII (American Standard Code for Information Interchange), which
uses the values 0 to 127 to code 128 different characters.
• Because of the globalization of business and the need for
computers to communicate with other computers around the
world, the ASCII character set is rapidly becoming inadequate.
• A I6-bit character set named Unicode has been developed as an
alternative
9
Character String Types
• The two most important design issues that are specific to
character string types are the following:
• Should strings be simply a special kind of character array or a
primitive type (with no array-style subscripting operations)?
• Should strings have static or dynamic length?
• C and C++ use char arrays to store character strings and
provide a collection of string operations through a standard
library whose header file is string.h.
10
Character String Types
• String length options
• Static length string
• The length can be static and set when the string is created
• Limited dynamic length string
• Strings can be allowed to have a varying length up to a declared and fixed
maximum set by the variable’s definition, as exemplified by the strings in C
and the C-style strings of C++.
• Example: char country[50];
country=“Gini”;
country=“Ethiopia”;
• Dynamic length string
• Strings can be designed to have varying length with no maximum limit as in
JavaScript and Perl 11
Character String Types
• Implementation
• In most cases, software is used to implement string storage,
retrieval, and manipulation.
• When character string types are represented as character arrays,
the language often supplies few operations.
• A descriptor for a static character string type, which is required
only during compilation, has three fields
• Name of the type
• Length in characters (for static character strings)
• Address of the first character
12
Character String Types
• Implementation
• Limited dynamic strings require a run-time descriptor to store
both the fixed maximum length and the current length
• In Ada language, both program readers and compilers are forced to use
other information to determine whether B(I) in this assignment is a
function call or a reference to an array element. 15
Subscript bindings and array categories
• The binding of the Subscript type to an array variable is
usually static, but the Subscript value ranges are
sometimes dynamically bound.
• In some languages, the lower bound of subscript range is
implicit
• C-based languages …….. 0
• FORTRAN 95 …………...1
• Based on the binding to subscript value ranges and the
binding to storage, there are five categories of arrays.
• The category names indicate where and when storage is
allocated.
16
Subscript bindings and array categories
• There are five categories of arrays, based on the binding to subscript
ranges, the binding to storage, and from where the storage is allocated.
• The category names indicate the design choices of these three.
• In the first four of these categories, once the subscript ranges are bound
and the storage is allocated, they remain fixed for the lifetime of the
variable.
• Keep in mind that when the subscript ranges are fixed, the array cannot
change size.
17
Subscript bindings and array categories
1. Static array
• Subscript ranges are statically bound
• Storage allocation is static (done before run
time).
• Advantage: efficiency
• No dynamic allocation or deallocation is
required.
• Disadvantage
• Storage for the array is fixed for the
entire execution time of the program.
18
Subscript bindings and array categories
2. Fixed stack-dynamic array
• Subscript range are statically bound
• However, the allocation is done at declaration elaboration time during
execution.
• In stack, allocate and de-allocate the memory automatically as soon as the
corresponding method completes its execution
• The advantage of fixed stack-dynamic arrays over static arrays is space
efficiency.
• A large array in one subprogram can use the same space as a large array in
a different subprogram, as long as both subprograms are not active at the
same time
• The same is true if the two arrays are in different blocks that are not
active at the same time
• The disadvantage is the required allocation and deallocation time 19
Subscript bindings and array categories
3. Stack-dynamic array
• Subscript ranges are dynamically bound
• Storage allocation is dynamic (done during run time).
• Once the subscript ranges are bound and the storage is
allocated, however, they remain fixed during the life time of the
variable
• The advantage of stack-dynamic arrays over static and fixed
stack-dynamic arrays is flexibility:
• The size of the array need not be known until the array is
bound to be used
20
Subscript bindings and array categories
4. Fixed heap-dynamic array
• Subscript ranges are dynamically bound
• Storage binding is dynamic
• Both are fixed after storage is allocated
• In heap, binding is done when the user program requests them during execution
• This memory allocation scheme is different from stack-space allocation, here no
automatic de-allocation feature is provided.
• It use garbage collection to remove to remove unused object.
• Size of heap memory is large compared to stack
• Advantage
Flexibility—the array’s size always fits the problem
• Disadvantage
Allocation time from the heap, which is longer than allocation time from the
stack
21
Subscript bindings and array categories
• Heap-dynamic array
• Subscript ranges are dynamically bound
• Storage binding is also dynamic
• Advantage:
Arrays can grow and shrink any number of times during
program execution
• Disadvantage
• The disadvantage is that allocation and deallocation take
longer and may happen many times during execution of the
program
22
Associative arrays
• An associative array is an unordered collection of data
elements that are indexed by an equal number of values
called keys
• In the case of non-associative arrays, the indices never
need to be stored (because of their regularity).
• In an associative array, however, the user-defined keys
must be stored in the structure.
So each element of an associative array is in fact a pair of
entities, a key and a value.
Example of associative array in Perl
23
Record Types
• A record is an aggregate of data elements in which the individual elements are
identified by names and accessed through offsets from the beginning of the
structure.
• In some languages that support object-oriented programming, data classes serve
as records.
• Used to model a collection of data in which the individual elements are not of
the same type or size
• Example: Information about a college student might include name, student
number, grade point average, and so forth
• A data type for such a collection might use a character string for the name, an
integer for the student number, a floating point for the grade point average,
and so forth
• In C, C++, and C#, records are supported with the struct data type
24
Record Types
• References to the individual fields of records are syntactically specified
by several different methods
• Example: accessing the middle name of an employee in an employee
record:
• Most programming languages use dot operator (C, C++)
• A fully qualified reference to a record field is one in which all
intermediate record names, from the largest enclosing record to the
specific field, are named in the reference.
• Both the COBOL and the Ada example field references above are fully
qualified.
• As an alternative to fully qualified references, COBOL allows elliptical
references to record fields.
25
Record Types
• In an elliptical reference, the field is named, but any or all of the enclosing
record names can be omitted, as long as the resulting reference is unambiguous
in the referencing environment.
• For example, FIRST, FIRST OF EMPLOYEE-NAME, and FIRST OF EMPLOYEE-
RECORD are elliptical references to the employee’s first name in the COBOL
record declared above.
• Although elliptical references are a programmer convenience, they require a
compiler to have elaborate data structures and procedures in order to correctly
identify the referenced field.
• They are also some what detrimental to readability.
26
Pointer and Reference Types
• A pointer type variable has a range of values that consists of
memory addresses and a special value, nil
• Pointers designed for two purposes
For indirect addressing, heavily used in assembly languages
For managing dynamic memory: can be used to access a
location in an area where storage is dynamically allocated
called a heap.
• Variables that are dynamically allocated from the heap are
called heap-dynamic variables.
27
Type checking
• Type checking is the activity of ensuring that the operands of an
operator are of compatible types
• A compatible type is one that either is legal for the operator or is
allowed under language rules to be implicitly converted by compiler-
generated code (or the interpreter) to a legal type
• This automatic conversion is called a coercion
• For example, if an int variable and a float variable are added in Java, the
value of the int variable is coerced to float and a floating-point add is
done.
• A type error is the application of an operator to an operand of an
inappropriate type.
• For example, in the original version of C, if an int value was passed to
a function that expected a float value, a type error would occur 28
29
Exercises
• Define a primitive data type
• What is the difference between single-precision and double-
precision floating points
• Discuss how floating points are represented in computers
• State the reason that a boolean value is represented using a byte
instead of a bit
• Discuss the array categories
• Show clearly that BCD takes more space than binary
representation
30