04 Strings
04 Strings
Lecture 4: Chars
and C-Strings
Friday, January 14, 2022
s t r i n g \0
Computer Systems
Winter 2022
Stanford University
Computer Science Department
• Logistics
• Assign1 — Due Wednesday at 11:59pm, grace period until Friday.
• Feedback: during the quarter you should receive a couple of feedback
emails about the course. Please be honest — I want to improve the
course where necessary! Constructive feedback and criticism is always
appreciated.
• Reading: Reader: C Primer, C Strings, K&R 1.9, 5,5, Appendix B3
• Chars
• ctype library
• C-Strings
• How strings are laid out in memory
• The string.h library
C's char type
C's char type
Most likely, you are already familiar with the char type from other courses. In C,
chars are de ned to be a 1-byte value, and most often chars are signed, although
we usually only use 0-127 for character data (see below).
A char does not necessarily have to hold alphabetic or numeric character data, but
often it does, and in C, the ASCII character set de nes the encoding between the
numeric value of the char and its character mapping. We will limit ourselves to
character data in the range of 0 - 127, which is what ASCII de nes.
There is a standard called "unicode" that you will investigate for Assignment 1, but
for CS 107, we will limit ourselves to the ASCII character set.
fi
fi
fi
The ctype library
One of the standard libraries you should become familiar with is the "ctype" library,
which includes many functions that act on character data.
The functions usually take an int instead of a char, and this is because the
functions can accept the full unsigned char range (0 - 255) plus the special character
EOF ("end of le"), which is often represented by -1.
We can see information about the ctype functions by typing man function, where
function is one of the following (there are more, but we only care about these):
isalpha, isdigit, isalnum, islower, isupper, isspace,
isxdigit, tolower, and toupper. You can get a list of most of them with a
combination of "man isalpha" and "man tolower".
fi
The ctype library
The following code demonstrates some of the functions in the ctype library:
// file: ctypedemo.c
...
#include<stdio.h>
#include<stdlib.h>
while (string[i] != 0) {
#include<ctype.h>
if (isalpha(string[i])) alphacount++;
if (isdigit(string[i])) digitcount++;
int main(int argc, char **argv)
if (isspace(string[i])) spacecount++;
{
if (ispunct(string[i])) punctcount++;
char *string = argv[1];
total++;
i++;
// count alpha characters, digits,
}
// whitespace, and punctuation
printf("Alphabetic characters: %d\n",alphacount);
int alphacount = 0;
printf("Digits: %d\n",digitcount);
int digitcount = 0;
printf("Spaces: %d\n",spacecount);
int spacecount = 0;
printf("Punctuation: %d\n",punctcount);
int punctcount = 0;
printf("Total characters: %d\n",total);
int total = 0;
int i = 0;
return 0;
}
C Strings
C strings are simply a sequence of chars, followed by a terminating 0 (called a
"null" byte).
0x105 \0
1. str is a variable that holds the address
0x104 e
of the rst character in "apple".
0x103 l
2. We have drawn the array vertically, with
0x102 p
str
the lowest address at the bottom
0x101 p
0x100 3. Each character is 1 byte away from the
0x100 a
previous character.
fi
fi
C Strings
It is meaningless in C to compare strings by their pointer values:
// file: pointer_compare.c
#include<stdio.h>
#include<stdlib.h>
$ gcc -g -O0 -std=gnu99 -Wall
int main(int argc, char **argv) pointer_compare.c -o pointer_compare
{
char *s1 = argv[1]; $ ./pointer_compare cat dog
char *s2 = argv[2]; cat is less than dog
// the following two lines do not compare
cat address: 0x7ffeef0e9962
// the two strings! dog address: 0x7ffeef0e9966
if (s1 < s2) printf("%s is less than %s\n",s1,s2);
if (s1 == s2) printf("%s is equal to %s\n",s1,s2);
$ ./pointer_compare dog cat
if (s1 > s2) printf("%s is greater than %s\n",s1,s2); dog is less than cat
printf("%s address: %p\n",s1,s1); dog address: 0x7ffeeb6b7962
printf("%s address: %p\n",s2,s2);
cat address: 0x7ffeeb6b7966
return 0;
}
Wrong!
C Strings
Assigning a string pointer to another string pointer does not make a copy of the
original string! Instead, both pointers point to the same string.
Because of this, changing a character via either pointer changes the string.
// file: string_pointers.c
#include<stdio.h>
#include<stdlib.h>
$ gcc -g -O0 -std=gnu99 -Wall string_pointers.c
int main(int argc, char **argv) -o string_pointers $ ./string_pointers cs107
{
char *s1 = argv[1]; $ ./string_pointers cs107
char *s2 = s1; // not a copy! address: 0x7ffee837f962, string:xy107
s1[0] = 'x'; address: 0x7ffee837f962, string:xy107
s2[1] = 'y';
return 0;
}
The String Library
One of the more important libraries for CS 107 is the string library, <string.h>
You need to be very familiar with the library functions we will discuss, and you may
see any of them on the midterm and nal exams.
Do not re-write these functions unless asked to explicitly for an assignment! The
string library is nely tuned, and it works. It isn't worth the time or effort to try and re-
write the string library functions (and we will take points off if you do!)
String library functions all have a worst-case complexity of O(n). This is because
strings are not objects, and don't have any information (e.g., the string length)
embedded in them.
fi
fi
The String Library: strlen
strlen: Calculates and returns the length of the string. Prototype:
strncmp: Performs the same comparison as strcmp except that it stops after n
characters (and does not traverse past null characters). Prototype:
int strncmp(const char *s, const char *t, size_t n);
The String Library: strcmp and strncmp
// file: strcmp_ex.c
#include<stdio.h>
#include<stdlib.h>
#include<string.h> cmp_result = strncmp(s1,s2,cmplen);
if (cmp_result == 0) {
int main(int argc, char **argv) result_text = "is the same as";
{ } else if (cmp_result < 0) {
char *s1 = argv[1]; result_text = "comes before";
char *s2 = argv[2]; } else {
int cmplen = atoi(argv[3]); result_text = "comes after";
}
int cmp_result = strcmp(s1,s2); printf("Up to character %d, \"%s\" %s \"%s\" in the alphabet.\n",
cmplen,s1,result_text,s2);
char *result_text;
return 0;
if (cmp_result == 0) { }
result_text = "is the same as";
} else if (cmp_result < 0) {
result_text = "comes before";
} else {
result_text = "comes after";
}
printf("String \"%s\" %s \"%s\" in the alphabet.\n",
s1,result_text,s2);
fi
3 minute break
The String Library: strcpy
strcpy: Copies src to dst, including the null byte. The caller is responsible for
ensuring that there is enough space in dst to hold the entire copy. The strings
may not overlap.
char *strcpy(char *dst, const char *src);
// file: strcpy_ex.c
#include<stdio.h> $ ./strcpy_ex hello
#include<stdlib.h> word: xello
#include<string.h> wordcopy: yello
printf("word: %s\n",word);
printf("wordcopy: %s\n",wordcopy);
return 0;
}
fi
The String Library: strncpy
The following is a buggy version, without the appropriate checks!
// file: strncpy_buggy.c
#include<stdio.h> $ ./strncpy_buggy wonderful
#include<stdlib.h>
#include<string.h>
word: wonderful
wordcopy: wonde⍰⍰J⍰⍰⍰
const int MAX_WORDLEN = 5;
strcat(word1cpy_a,word2);
strncat(word1cpy_b,word2,MAX_CPY);
printf("%s + %s = %s\n",word1,word2,word1cpy_a);
printf("%s + first %d bytes of %s = %s\n",
word1,MAX_CPY,word2,word1cpy_b);
return 0;
}
The String Library: strcat and strncat
// file: strcat_ex.c
#include<stdio.h>
#include<stdlib.h>
#include<string.h> $ ./strcat_ex happy birthday
const int MAX_CPY = 3;
happy + birthday = happybirthday
happy + first 3 bytes of birthday = happybir
int main(int argc, char **argv)
{
char *word1 = argv[1];
return 0;
}
The String Library: strcat and strncat
// file: strcat_ex.c
#include<stdio.h>
#include<stdlib.h>
#include<string.h> $ ./strcat_ex happy birthday
const int MAX_CPY = 3;
happy + birthday = happybirthday
happy + first 3 bytes of birthday = happybir
int main(int argc, char **argv)
{
char *word1 = argv[1];
strcat(word1cpy_a,word2);
strncat(word1cpy_b,word2,MAX_CPY);
printf("%s + %s = %s\n",word1,word2,word1cpy_a);
printf("%s + first %d bytes of %s = %s\n",
word1,MAX_CPY,word2,word1cpy_b);
return 0;
}
The String Library: strcat and strncat
// file: strcat_ex.c
#include<stdio.h>
#include<stdlib.h>
#include<string.h> $ ./strcat_ex happy birthday
const int MAX_CPY = 3;
happy + birthday = happybirthday
happy + first 3 bytes of birthday = happybir
int main(int argc, char **argv)
{
char *word1 = argv[1];
return 0;
} We need 5 + 3 + 1 for the total with null.
The String Library: strspn
strspn : Calculates and returns the length in bytes of the initial part of s which
contains only characters in accept.
Learn this function well! It tends to make an appearance on CS 107 midterms and
nals!
fi
fi
The String Library: strcspn
strcspn : Similar to strspn except that strcspn returns the length in bytes of
the initial part of s which does not contain any characters in reject.
Learn this function well, and make sure you understand how it works and the
difference between strspn and strcspn!
BTW, the "c" in strcspn stands for "complement" -- the complement of the reject
characters is what is being "spanned".
fi
The String Library: strspn and strcspn example
// file: strspn_ex.c
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
These two functions take care of allocating space for the duplicate of the string, but
both require the calling function to free the copy when it is no longer needed. If the
copy isn't freed, this is considered a memory leak, and can waste memory.
The String Library: strdup and strndup
// file: strdup_ex.c
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
printf("word: %s\n",word);
printf("word_copy: %s\n",word_copy);
printf("First %d letters of word: %s\n",BYTES_TO_COPY,word_copy3);
return 0;
}
Why don't strings keep their own length?
C strings differ from C++ strings in that they are simple, and are just a null-
terminated character array.
Strings didn't have to be this way -- when C was being developed, another
popular language, Pascal, had "length-pre xed" strings, which which stored the
length in the rst byte of the string. Although this made nding the length of a
string O(1), it limited the size of strings to 256 characters! (Later versions of Pascal
added support for up to 64-bit pre xes, but this had the downside of adding
length to the string, which takes up space).
The original justi cation in C was that having only 1-byte of overhead was nice
because memory was limited (remember this was the 1970s!), and the terminating
null was better than a pre x-byte because it didn't limit the size of the string.
fi
fi
fi
fi
fi
fi
References and Advanced Reading
•References:
•https://en.wikibooks.org/wiki/C_Programming/String_manipulation
•https://www.tutorialspoint.com/c_standard_library/ctype_h.htm
•https://www.tutorialspoint.com/c_standard_library/string_h.htm
•Advanced Reading:
•https://www.cs.bu.edu/teaching/cpp/string/array-vs-ptr/
•https://www.quora.com/Why-dont-we-need-null-character-in-arrays-as-in-strings-
to-know-its-end-point
•What is the justi cation for a null-terminated string? https://stackover ow.com/
questions/4418708/whats-the-rationale-for-null-terminated-strings
•Interesting criticism of the Pascal language for its string type: http://
www.lysator.liu.se/c/bwk-on-pascal.html
fi
fl