0% found this document useful (0 votes)
10 views9 pages

Microsoft PowerPoint - Lect - 10

This document discusses fundamentals of strings and characters in C including how strings are represented and stored in memory as arrays of characters ending with a null character. It also covers various functions in the ctype.h library that can perform operations on characters like checking if a character is a letter, digit, whitespace or other character type.

Uploaded by

mctrl06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views9 pages

Microsoft PowerPoint - Lect - 10

This document discusses fundamentals of strings and characters in C including how strings are represented and stored in memory as arrays of characters ending with a null character. It also covers various functions in the ctype.h library that can perform operations on characters like checking if a character is a letter, digit, whitespace or other character type.

Uploaded by

mctrl06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Introduction Fundamentals of strings and characters

BİL 214 – System Programming • In this lecture, we introduce the C Standard Library functions that • Characters are the fundamental building blocks of source programs.
facilitate string and character processing.
TOBB ETU • Every program is composed of a sequence of characters that—when grouped
together meaningfully—is interpreted by the computer as a series of instructions
Fall 2022 • These functions enable programs (editors, word processors, page used to accomplish a task.
Lecture 10 layout software, computerized typesetting systems) to process
characters, strings, lines of text and blocks of memory. • A program may contain character constants.
C Programming
Characters and Strings • A character constant is an int value represented as a character in single quotes.
• The text manipulations performed by formatted input/output
functions like printf and scanf can be implemented using the
functions discussed in this chapter. • The value of a character constant is the integer value of the character in the
machine’s character set.

1 2 3

Fundamentals of strings and characters Fundamentals of strings and characters Fundamentals of strings and characters
• For example, 'z' represents the integer value of z, and '\n' the integer • A string in C is an array of characters ending in the null character ( '\0'). • A character array or a variable of type char * can be initialized with a string in a
definition.
value of newline (122 and 10 in ASCII, respectively).
• A string is accessed via a pointer to the first character in the string. • The definitions
• char color[] = "blue";
• A string is a series of characters treated as a single unit. • const char *colorPtr = "blue";
• The value of a string is the address of its first character.
each initialize a variable to the string "blue".
• A string may include letters, digits and various special characters such • Thus, in C, it is appropriate to say that a string is a pointer—in fact, a
as +, -, *, / and $. pointer to the string’s first character. • The first definition creates a 5-element array color containing the characters 'b', 'l', 'u', 'e'
and '\0'.

• String literals, or string constants, in C are written in double quotation • In this sense, strings are like arrays, because an array is also a pointer to its • The second definition creates pointer variable colorPtr that points to the string "blue"
first element. somewhere in memory.
marks. (" ")
4 5 6

Fundamentals of strings and characters ctype.h


• The preceding array definition could also have been written • The character-handling library (ctype.h) includes several functions
• char color[] = { 'b', 'l', 'u', 'e', '\0' }; that perform useful tests and manipulations of character data.

The list of character-handling


• When defining a character array to contain a string, the array must be • Each function receives a character—represented as an int —or EOF as library (<ctype.h>) functions
large enough to store the string and its terminating null character. an argument.

• The preceding definition automatically determines the size of the • EOF normally has the value –1, and some hardware architectures do
array based on the number of initializers in the initializer list. not allow negative values to be stored in char variables, so the
character-handling functions manipulate characters as integers.

7 8 9
ctype.h
• Another set of useful functions are isspace, iscntrl, ispunct, isprint
and isgraph.

• Function isspace determines if a character is one of the following


white-space characters:
• space ( ' ' ),
• form feed ( '\f' ),
• newline ( '\n' ),
• carriage return ( '\r' ),
• horizontal tab ( '\t’ )
• vertical tab ( '\v' ).

10 11 12

ctype.h ctype.h ctype.h


• Function iscntrl determines if a character is one of the following • Function ispunct determines if a character is a printing character • Function isprint determines if a character can be displayed on the
control characters: other than a space, a digit or a letter, such as screen (including the space character).
• horizontal tab ( '\t' ),
• vertical tab ( '\v' ), • $ , # , ( , ) , [ , ] , { , } , ; , : , %
• form feed ( '\f' ),
• Function isgraph is the same as isprint, except that the space
• alert ( '\a' ),
character is not included.
• backspace ( '\b' ),
• carriage return ( '\r’ )
• newline ( '\n' )

13 14 15

String conversion functions String conversion functions Obsolete functions list


Obsolete functions Replacement Reason
• Next, we look at the string-conversion functions from the general
asctime() asctime_s() Non-reentrant
utilities library (stdlib.h).
atof() strtod() No error detection
atoi() strtol() No error detection
• These functions convert strings of digits to integer and floating-point atol() strtol() No error detection
values. atoll() strtoll() No error detection
ctime() ctime_s() Non-reentrant
fopen() fopen_s() No exclusive access to file
freopen() freopen_s() No exclusive access to file
• Note the use of const to declare variable nPtr in the function headers.
• Read from right to left as “nPtr is a pointer to a character constant”
• const specifies that the argument value will not be modified. • atof(), atoi(), atol() and atoll() are obselete. Do NOT use these.

16 17 18
strtod strtod strtol
• Function strtod converts a sequence of characters representing a floating-point value to double. const char *string = "51.2% are admitted"; // initialize string • Function strtol converts to long int a sequence of characters representing
char *stringPtr; // create char pointer an integer.
• The function returns 0 if it’s unable to convert any portion of its first argument to double. double d = strtod(string, &stringPtr);

• The function receives two arguments—a string (char *) and a pointer to a string (char **). • The function returns 0 if it’s unable to convert any portion of its first
• d is assigned the double value converted from string, and stringPtr is argument to long int.
• The string argument contains the character sequence to be converted to double—any whitespace assigned the location of the first character after the converted value
characters at the beginning of the string are ignored.
(51.2) in string. • The function’s three arguments are a string (char *), a pointer to a string
• The function uses the char ** argument to modify a char * in the calling function (stringPtr) so
and an integer.
that it points to the location of the first character after the converted portion of the string or to
the entire string if no portion can be converted.
• The string contains the character sequence to be converted to long—any
whitespace characters at the beginning of the string are ignored.
19 20 21

strtol strtoul strtoul


const char *string = "-1234567abc"; // initialize string pointer • Function strtoul converts to unsigned long int a sequence of const char *string = "1234567abc"; // initialize string pointer
char *remainderPtr; // create char pointer characters representing an unsigned long int value. char *remainderPtr; // create char pointer
long x = strtol(string, &remainderPtr, 0); unsigned long int x = strtoul(string, &remainderPtr, 0);

• The function uses the char ** argument to modify a char * in the • strtoul works identically to function strtol. • x is assigned the unsigned long int value converted from string.
calling function (remainderPtr) so that it points to the location of the
first character after the converted portion of the string or to the • The second argument, &remainderPtr, is assigned the remainder of string after
entire string if no portion can be converted. the conversion.

• The integer specifies the base of the value being converted. • The third argument, 0, indicates that the value to be converted can be in octal,
decimal or hexadecimal format.
22 23 24

stdio.h stdio.h fgets


• Next we look at several functions from the standard input/output • fgets reads a line of text from the standard input (keyboard) and.
library (stdio.h) specifically for manipulating character and string data.
• fgets reads characters from the standard input into its first argument – an array of chars
– until a newline or the EOF indicator is encountered, or until the maximum number of
characters is read.

• The maximum number of characters is one fewer than the value specified in fgets’s
second argument.

• The third argument specifies the stream from which to read characters—in this case, we
use the standard input stream (stdin).

• A null character ('\0') is appended to the array when reading terminates.

25 26 27
putchar getchar puts
• putchar recursively outputs the characters of the line in reverse order • getchar reads characters from the standard input into character array • puts displays characters as a string.
sentence.
• putchar returns the character written as an unsigned char cast to an • puts takes a string as an argument and displays the string followed by
int or EOF on error. • getchar reads a character from the standard input and returns the a newline character.
character as an integer – recall that an integer is returned to support
the end-of-file indicator.

28 29 30

sprintf sscanf string.h


• sprintf prints formatted data into array s – an array of characters. • sscanf reads formatted data from a character array. • Next, we look at the string-manipulation functions of the string-handling
library.

• sprintf uses the same conversion specifiers as printf • sscanf uses the same conversion specifiers as scanf. • The string-handling library (string.h) provides many useful functions for
• manipulating string data (copying strings and concatenating strings),
• comparing strings,
• searching strings for characters and other strings,
• tokenizing strings (separating strings into logical pieces) and
• determining the length of strings.

• Every function – except for strncpy – appends the null character to its
result.

31 32 33

Copy and append functions string.h strcpy and strncpy


• We start with copy and append functions: • strcpy copies its second argument (a string) into its first argument, a
• strcpy character array that must be large enough to store the string and its
• strncpy terminating null character.
• strcat
• strncat
• strncpy is equivalent to strcpy, except that strncpy specifies the
number of characters to be copied from the string into the array.

• strncpy does not necessarily write a terminating null character at the


end of its destination.

34 35 36
strcat and strncat string.h Comparison functions
• strcat appends its second argument (a string) to its first argument (a character • Functions strncpy and strncat specify a parameter of type size_t, • Next, we look at the string-handling library’s string-comparison
array containing a string).
which is a type defined by the C standard as the integral type of the functions:
• The first character of the second argument replaces the null ('\0') that terminates
value returned by operator sizeof. • strcmp
the string in the first argument. • strncmp

• You must ensure that the array used to store the first string is large enough to
store the first string, the second string and the terminating null character copied
from the second string.

• Function strncat appends a specified number of characters from the second


string to the first string. A terminating null character is automatically appended
to the result.

37 38 39

strcmp and strncmp strcmp and strncmp


• strcmp compares its first string argument with its second string argument, • Assuming that strcmp and strncmp return 1 when their arguments
character by character. are equal is a logic error.

• strcmp returns 0 if the strings are equal, a negative value if the first string is
less than the second string and a positive value if the first string is greater • Both functions return 0 (strangely, the equivalent of C's false value)
than the second string. for equality.

• strncmp is equivalent to strcmp, except that strncmp compares up to a • Therefore, when comparing two strings for equality, the result of
specified number of characters.
function strcmp or strncmp should be compared with 0 to determine
whether the strings are equal.
• strncmp does not compare characters following a null character in a string.
40 41 42

Search functions strchr


• Next, we look at the functions of the string-handling library used to • strchr searches for the first occurrence of a character in a string.
search strings for characters and other strings:
• strchr
• strcspn • If the character is found, strchr returns a pointer to the character in
• strpbrk the string; otherwise, strchr returns NULL.
• strrchr
• strspn
• strstr
• strtok

• The functions strcspn and strspn return size_t.


43 44 45
strcspn strpbrk strrchr
• strcspn determines the length of the initial part of the string in its first • strpbrk searches its first string argument for the first occurrence of • strrchr searches for the last occurrence of the specified character in a
argument that does not contain any characters from the string in its any character in its second string argument. string.
second argument.
• If a character from the second argument is found, strpbrk returns a • If the character is found, strrchr returns a pointer to the character in
• The function returns the length of the segment. pointer to the character in the first argument; otherwise, strpbrk the string; otherwise, strrchr returns NULL.
returns NULL.

46 47 48

strspn strstr strtok


• strspn determines the length of the initial part of the string in its first • strstr searches for the first occurrence of its second string argument • strtok is used to break a string into a series of tokens.
argument that contains only characters from the string in its second in its first string argument.
argument. • A token is a sequence of characters separated by delimiters (usually spaces
or punctuation marks, but a delimiter can be any character).
• If the second string is found in the first string, a pointer to the
• strspn returns the length of the segment. location of the string in the first argument is returned.
• For example, in a line of text, each word can be considered a token, and
the spaces and punctuation separating the words can be considered
delimiters.

• Multiple calls to strtok are required to tokenize a string—i.e., break it into


tokens (assuming that the string contains more than one token).
49 50 51

Example Example Example


• The first call to strtok (line 15) contains • Function strtok searches for the first • Subsequent strtok calls in line 20
two arguments: a string to be tokenized, character in string that’s not a delimiting continue tokenizing string.
and a string containing characters that character (space).
separate the tokens. • These calls contain NULL as their first
• This begins the first token. argument.
• In line 15, the statement
• The function then finds the next • The NULL argument indicates that the
char * tokenPtr = strtok(string, " "); delimiting character in the string and call to strtok should continue tokenizing
replaces it with a null ('\0’) character to from the location in string saved by the
assigns tokenPtr a pointer to the first terminate the current token. last call to strtok.
token in string.
• Function strtok saves a pointer to the • If no tokens remain when strtok is called,
• The second argument, " ", indicates that next character following the token in strtok returns NULL.
tokens are separated by spaces. string and returns a pointer to the
current token. • You can change the delimiter string in
each new call to strtok.

52 53 54
strtok Character encodings Character encodings
• Function strtok modifies the input string by placing '\0' at the end of • In an effort to standardize character representations, most computer • There are other coding schemes, but these two (ASCII and EBCDIC)
each token manufacturers have designed their machines to utilize one of two are the most popular.
popular coding schemes – ASCII or EBCDIC:
• Therefore, a copy of the string should be made if the string will be • The Unicode standard outlines a specification to produce consistent
used after the calls to strtok. • ASCII stands for “American Standard Code for Information encoding of the vast majority of the world’s characters and symbols.
Interchange,”
• ASCII, EBCDIC and Unicode are called character sets.
• EBCDIC (developed by IBM) stands for “Extended Binary Coded
Decimal Interchange Code.”

55 56 57

Memory manipulation functions Memory manipulation functions


• The string-handling library functions presented in this section manipulate, • The pointer parameters are declared void * so they can be used to manipulate
compare and search blocks of memory. memory for any data type.

• Recall from last lecture that any pointer can be assigned directly to a pointer of
• The functions treat blocks of memory as character arrays and can type void *, and a pointer of type void * can be assigned directly to a pointer of
manipulate any block of data. any other type.

• Because a void * pointer cannot be dereferenced, each function receives a size


• In the following function discussions, “object” refers to a block of data. argument that specifies the number of bytes the function will process.

• Note: Each of these functions has a more secure version described in • The memory manipulation functions do not check for terminating null characters,
optional Annex K of the C11 standard. because they manipulate blocks of memory that are not necessarily strings.

58 59 60

memcpy memmove Example


• Function memcpy copies a specified number of bytes from the object pointed to • memmove, like memcpy, copies a specified number of bytes from the object
by its second argument into the object pointed to by its first argument. pointed to by its second argument into the object pointed to by its first argument.

char s1[17]; // create char array s1 • Copying is performed as if the bytes were copied from the second argument into
char s2[] = "Copy this string"; // initialize char array s2 a temporary array, then copied from the temporary array into the first argument.
memcpy(s1, s2, 17);

• This allows bytes from one part of a string to be copied into another part of the
• The function can receive a pointer to any type of object. same string, even if the two portions overlap.

• The result of this function is undefined if the two objects overlap in memory (i.e., • String-manipulation functions other than memmove that copy characters have
if they are parts of the same object)—in such cases, use memmove. undefined results when copying takes place between parts of the same string.

61 62 63
memcpm Example memchr
• memcmp compares the specified number of bytes of its first • memchr searches for the first occurrence of a byte, represented as
argument with the corresponding bytes of its second argument. unsigned char, in the specified number of bytes of an object.

• The function returns • If the byte is found, a pointer to the byte in the object is returned;
• A value greater than 0 if the first argument is greater than the second, otherwise, a NULL pointer is returned.
• 0 if the arguments are equal, and
• A value less than 0 if the first argument is less than the second.

64 65 66

Example memset memset


• memset copies the value of the byte in its second argument into the • Use memset to set an array’s elements to 0 rather than looping
first n bytes of the object pointed to by its first argument, where n is through them and assigning 0 to each element.
specified by the third argument.
• Many hardware architectures have a block copy or clear instruction
• Doesn’t this look a bit too low-level, even unnecessary? that the compiler can use to optimize memset for high-performance
zeroing of memory.
• Why do we need this function?
• Typically, this is not the case for non-zero values…

67 68 69

Example Some other functions


• The two remaining functions of the string-handling library are:
• strerror
• strlen

70 71 72
strerror Example strlen
• strerror takes an error number and creates an error message string. • strlen takes a string as an argument and returns the number of
characters in the string.
• A pointer to the string is returned.
• The terminating null character is not included in the length.

73 74 75

Secure string-processing functions


• Earlier we mentioned secure functions printf_s and scanf_s.

• In this chapter, we presented functions sprintf, strcpy, strncpy, strcat, strncat, strtok, strlen,
memcpy, memmove and memset.

• More secure versions of these and many other string-processing and input/output functions are
described by the C11 standard’s optional Annex K.

• If your C compiler supports Annex K, you should use the secure versions of these functions.

• Among other things, the more secure versions help prevent buffer overflows by requiring
additional parameters that specify the number of elements in a target array and by ensuring that
pointer arguments are non-NULL.

76

You might also like