0% found this document useful (0 votes)
645 views8 pages

User Defined Ordinal Type

Character strings are sequences of characters that can be used for labeling output and input/output of various data types. The main design issues for strings are whether they should be a primitive type or character array, and whether their length should be static or dynamic. Common string operations include assignment, concatenation, substring references, comparison, and pattern matching. Languages handle strings differently - some use character arrays like C/C++, while others have string classes like Java. Pattern matching regular expressions are supported in languages like Perl, JavaScript, Ruby and PHP.

Uploaded by

shankar singam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
645 views8 pages

User Defined Ordinal Type

Character strings are sequences of characters that can be used for labeling output and input/output of various data types. The main design issues for strings are whether they should be a primitive type or character array, and whether their length should be static or dynamic. Common string operations include assignment, concatenation, substring references, comparison, and pattern matching. Languages handle strings differently - some use character arrays like C/C++, while others have string classes like Java. Pattern matching regular expressions are supported in languages like Perl, JavaScript, Ruby and PHP.

Uploaded by

shankar singam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Character String Types:

A character string type is one in which the values consist of


sequences of characters. Character string constants are used to label
output, and the input and output of all kinds of data are often done in
terms of strings.

Design Issues
The two most important design issues that are specific to character
string types are the following:
• Should strings be simply a special kind of character array or a
primitive type?
• Should strings have static or dynamic length?

Strings and Their Operations:


The most common string operations are assignment, catenation,
substring reference, comparison, and pattern matching.
A substring reference is a reference to a substring of a given string.
Substring references are discussed in the more general context of
arrays, where the substring references are called slices.

Pattern matching is another fundamental character string operation.


In some languages, pattern matching is supported directly in the
language. In others, it is provided by a function or class library.
If strings are not defined as a primitive type, string data is usually
stored in arrays of single characters and referenced as such in the
language. This is the approach taken by C and C++.

C and C++ use char arrays to store character strings. These


languages provide a collection of string operations through standard
libraries. Many uses of strings and many of the library functions use
the convention that character strings are terminated with a special
character, null, which is represented with zero.

The library operations simply carry out their operations until the null
character appears in the string being operated on.

The character string literals that are built by the compiler also have
the null character. For example, consider the following declaration:
char str[] = "apples";
In this example, str is an array of char elements, specifically apples0,
where 0 is the null character.

one given string onto another; strcmp, which lexicographically


compares (by the order of their character codes) two given strings;
and strlen, which returns the number of characters, not counting
the null, in the given string.
The parameters and return values for most of the string manipulation
functions are char pointers that point to arrays of char. Parameters
can also be string literals.
strcpy(dest, src);

If the length of dest is 20 and the length of src is 50, strcpy will write
over the 30 bytes that follow dest. The point is that strcpy does not
know the length of dest, so it cannot ensure that the memory
following it will not be overwritten. The same problem can occur with
several of the other functions in the C string library.

In Java, strings are supported by the String class, whose values are
constant strings, and the StringBuffer class, whose values are
changeable and are more like arrays of single characters. These values
are specified with methods of the StringBuffer class. C# and Ruby
include string classes that are similar to those of Java.

Python includes strings as a primitive type and has operations for


substring reference, catenation, indexing to access individual
characters, as well as methods for searching and replacement.

Python strings are immutable, similar to the String class objects of


Java.

Perl, JavaScript, Ruby, and PHP include built-in pattern-


matching operations. In these languages, the pattern-matching
expressions are somewhat loosely based on mathematical regular
expressions. In fact, they are often called regular expressions.
Consider the following pattern expression:
/[A-Za-z][A-Za-z\d]+/
This pattern matches (or describes) the typical name form in
programming languages. The brackets enclose character classes.
The first character class specifies all letters; the second specifies
all letters and digits (a digit is specified with the abbreviation \d). If
only the second character class were included, we could not prevent a
name from beginning with a digit. The plus operator following the
second category specifies that there must be one or more of what
is in the category. So, the whole pattern matches strings that begin
with a letter, followed by one or more letters or digits.

Next, consider the following pattern expression:


/\d+\.?\d*|\.\d+/
This pattern matches numeric literals.
The \. specifies a literal decimal point.
The question mark quantifies what it follows to have zero or one
appearance.
The vertical bar (|) separates two alternatives in the whole pattern.
The first alternative matches strings of one or more digits, possibly
followed by a decimal point, followed by zero or more digits; the
second alternative matches strings that begin with a decimal point,
followed by one or more digits. Pattern-matching capabilities using
regular expressions are included in the class libraries of C++, Java,
Python, C#, and F#.
String Length Options:
There are several design choices regarding the length of string values.
First, the length can be static and set when the string is created.
Such a string is called a static length string. This is the choice for
the strings of Python, the immutable objects of Java’s String class, as
well as similar classes in the C++ standard class library, Ruby’s
built-in String class, and the .NET class library available to C#
and F#.

The second option is to allow strings to have varying length up to


a declared and fixed maximum set by the variable’s definition, as
exemplified by the strings in C and the C-style strings of C++.
These are called limited dynamic length strings. Such string
variables can store any number of characters between zero and the
maximum.

The third option is to allow strings to have varying length with no


maximum, as in JavaScript, Perl, and the standard C++ library.
These are called dynamic length strings. This option requires the
overhead of dynamic storage allocation and deallocation but
provides maximum flexibility.
Ada 95+ supports all three string length options.

User-Defined Ordinal Types:


An ordinal type is one in which the range of possible values can be
easily associated with the set of positive integers. In Java, for
example, the primitive ordinal types are integer, char, and boolean.
There are two user-defined ordinal types that have been supported
by programming languages: enumeration and subrange.

Enumeration Types:
An enumeration type is one in which all of the possible values, which
are named constants, are provided, or enumerated, in the definition.
Enumeration types provide a way of defining and grouping
collections of named constants, which are called enumeration
constants. The definition of a typical enumeration type is shown in the
following C# example:
enum days {Mon, Tue, Wed, Thu, Fri, Sat, Sun};
The enumeration constants are typically implicitly assigned the
integer values, 0, 1, . . . but can be explicitly assigned any integer
literal in the type’s definition.

The design issues for enumeration types are as follows:


• Is an enumeration constant allowed to appear in more than one type
definition, and if so, how is the type of an occurrence of that constant
in the program checked?
• Are enumeration values coerced to integer?
• Are any other types coerced to an enumeration type?

C and Pascal were the first widely used languages to include an


enumeration data type. C++ includes C’s enumeration types. In C++,
we could have the following:
enum colors {red, blue, green, yellow, black};
colors myColor = blue, yourColor = red;
The colors type uses the default internal values for the enumeration
constants, 0, 1, . . . , although the constants could have been assigned
any integer literal (or any constant-valued expression). The
enumeration values are coerced to int when they are put in integer
context. This allows their use in any numeric expression. For
example, if the current value of myColor is blue, then the expression
myColor++
would assign green to myColor.
C++ also allows enumeration constants to be assigned to variables of
any numeric type, though that would likely be an error. However, no
other type value is coerced to an enumeration type in C++. For
example,
myColor = 4;
is illegal in C++. This assignment would be legal if the right side had
been cast to colors type. This prevents some potential errors.

In ML, enumeration types are defined as new types with datatype


declarations.
For example, we could have the following:
datatype weekdays = Monday | Tuesday | Wednesday |
Thursday | Friday
Subrange Types:
A subrange type is a contiguous subsequence of an ordinal type.
For example, 12..14 is a subrange of integer type. Subrange types
were introduced by Pascal and are included in Ada. There are no
design issues that are specific to sub range types.

In Ada, subranges are included in the category of types called


subtypes.
type Days is (Mon, Tue, Wed, Thu, Fri, Sat, Sun);
subtype Weekdays is Days range Mon..Fri;
subtype Index is Integer range 1..100;

Array Types:
Design Issues:
The primary design issues specific to arrays are the following:
• What types are legal for subscripts?
• Are subscripting expressions in element references range checked?
• When are subscript ranges bound?
• When does array allocation take place?
• Are ragged or rectangular multidimensioned arrays allowed, or
both?
• Can arrays be initialized when they have their storage allocated?
• What kinds of slices are allowed, if any?

You might also like