0% found this document useful (0 votes)

68 views20 pages

Strings: Steven Skiena

This document discusses string representations and operations in various programming languages. It summarizes common string functions in C/C++ and Java and discusses different ways of representing strings as arrays, linked lists, or other data structures. It also provides code examples for searching and manipulating strings.

Uploaded by

Pusat Tuisyen Bestari Ilmu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views20 pages

Strings: Steven Skiena

Uploaded by

Pusat Tuisyen Bestari Ilmu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Lecture 3: Strings Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 117944400 http://www.cs.sunysb.

edu/skiena

Character Codes
Character codes are mappings between numbers and the symbols which make up a particular alphabet. The American Standard Code for Information Interchange (ASCII) is a single-byte character code where 27 = 128 characters are specied. Bytes are eight-bit entities; so that means the highest-order bit is left as zero.
0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 NUL BS DLE CAN SP ( 0 8 @ H P X h p x 1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 SOH HT DC1 EM ! ) 1 9 A I Q Y a i q y 2 10 18 26 34 42 50 58 66 74 82 90 98 106 114 122 STX NL DC2 SUB * 2 : B J R Z b j r z 3 11 19 27 35 43 51 59 67 75 83 91 99 107 115 123 ETX VT DC3 ESC # + 3 ; C K S [ c k s { 4 12 20 28 36 44 52 60 68 76 84 92 100 108 116 124 EOT NP DC4 FS $ , 4 < D L T / d l t 5 13 21 29 37 45 53 61 69 77 85 93 101 109 117 125 ENQ CR NAK GS % 5 = E M U ] e m u } 6 14 22 30 38 46 54 62 70 78 86 94 102 110 118 126 ACK SO SYN RS & . 6 > F N V f n v 7 15 23 31 39 47 55 63 71 79 87 95 103 111 119 127 BEL SI ETB US / 7 ? G O W g o w DEL

Properties of ASCII
Several properties of the design make programming tasks easier: All non-printable characters have either the rst three bits as zero or all seven lowest bits as one. This makes it very easy to eliminate them before displaying junk. Both the upper- and lowercase letters and the numerical digits appear sequentially. Thus we can iterate through all the letters/digits simply by looping from the value of the rst symbol (say, a) to value of the last symbol (say, z).

We can convert a character (say, I) to its rank in the collating sequence (eighth, if A is the zeroth character) simply by subtracting off the rst symbol (A). We can convert (say C) from upper- to lowercase by adding the difference of the upper and lowercase starting character (C-A+a). Similarly, a character x is uppercase if and only if it lies between A and Z. The character code tells us what will happen when naively sorting text les. Which of x or 3 or C appears rst in alphabetical order? Sorting alphabetically means sorting by character code. Using a different collating sequence requires more complicated comparison functions.

Non-printable character codes for new-line (10) and carriage return (13) are designed to delimit the end of text lines. Inconsistent use of these codes is one of the pains in moving text les between UNIX and Windows systems.

Unicode
More modern international character code designs such as Unicode use two or even three bytes per symbol, and can represent virtually any symbol in every language on earth. Older languages, like Pascal, C, and C++, view the char type as virtually synonymous with 8-bit entities. However, good old ASCII remains alive embedded in Unicode. Java, on the other hand, was designed to support Unicode, so characters are 16-bit entities. The upper byte is all zeros when working with ASCII/ISO Latin 1 text.

Representing Strings
Strings are sequences of characters, where order clearly matters. It is important to be aware of how your favorite programming language represents strings, because there are several different possibilities: Null-terminated Arrays C/C++ treats strings as arrays of characters. The string ends the instant it hits the null character \0, i.e., zero ASCII. Failing to end your string explicitly with a null typically extends it by a bunch of unprintable characters.

Array Plus Length Another scheme uses the rst array location to store the length of the string, thus avoiding the need for any terminating null character. Presumably this is what Java implementations do internally. Linked Lists of Characters Text strings can be represented using linked lists, but this is typically avoided because of the high space-overhead associated with having a several-byte pointer for each single byte character.

Which String Representation?

The underlying string representation can have a big impact on which operations are easily or efciently supported. Compare each of these three data structures with respect to the following properties: Which uses the least amount of space? On what sized strings? Which constrains the content of the strings which can possibly be represented? Which allow constant-time access to the ith character?

Which allow efcient checks that the ith character is in fact within the string, thus avoiding out-of-bounds errors? Which allow efcient deletion or insertion of new characters at the ith position? Which representation is used when users are limited to strings of length at most 255, e.g., le names in Windows?

Searching for Patterns

The simplest algorithm to search for the pattern string p in text t overlays the pattern string on the text, and checks whether every pattern character matches the corresponding text character:
/* */ int findmatch(char *p, char *t) { int i,j; int plen, tlen; plen = strlen(p); tlen = strlen(t); for (i=0; i<=(tlen-plen); i=i+1) { j=0; while ((j<plen) && (t[i+j]==p[j])) j = j+1; if (j == plen) return(i); } return(-1); } Return position of the first occurrence of pattern p in the text t, and -1 if it does not occur.

/* counters / / string lengths */

Note that this routine only searches for exact pattern matches. If a letter is capitalized in the pattern but not in the text there is no match. This algorithm runs in O (|p| |q |) time. More complicated but efcient linear-time algorithms exist for substring pattern matching.

C String Library Functions

The C language character library ctype.h contains several simple tests and manipulations on character codes. As with all C predicates, true is dened as any non-zero quantity, and false as zero.
#include <ctype.h> int int int int int int int isalpha(int c); isupper(int c); islower(int c); isdigit(int c); ispunct(int c); isxdigit(int c); isprint(int c); /* include the character library */ /* /* /* /* /* /* /* true true true true true true true if if if if if if if c c c c c c c is is is is is is is either upper or lower case */ upper case */ lower case */ a numerical digit (0-9) */ a punctuation symbol */ a hexadecimal digit (0-9,A-F) */ any printable character */

int toupper(int c); int tolower(int c);

/* convert c to upper case -- no error checking */ /* convert c to lower case -- no error checking */

These appear in the C language string library string.h.

#include <string.h> /* include the string library */ /* /* /* /* /* /* concatenation */ is s1 == s2? */ copy src to dist length of string search for s2 in iterate words in char *strcat(char *dst, const char *src); int strcmp(const char *s1, const char *s2); char *strcpy(char *dst, const char *src); size_t strlen(const char *s); char *strstr(const char *s1, const char *s2); char *strtok(char *s1, const char *s2);

*/ */ s1 */ s1 */

C++ String Library Functions

In addition to supporting C-style strings, C++ has a string class which contains methods for these operations and more:
string::size() string::empty() string::c_str() /* string length */ /* is it empty */ /* return a pointer to a C style string */ /* access the ith character */

string::operator [](size_type i)

string::append(s) /* append to string */ string::erase(n,m) /* delete a run of characters */ string::insert(size_type n, const string&s) /* insert string s at n */ string::find(s) string::rfind(s) string::first() string::last()

/* search left or right for the given string */

/* get characters, also there are iterators */

Overloaded operators exist for concatenation and string comparison.

Java String Objects

Java strings are rst-class objects deriving either from the String class or the StringBuffer class. The String class is for static strings which do not change, while StringBuffer is designed for dynamic strings. Recall that Java was designed to support Unicode, so its characters are 16-bit entities. The java.text package contains more advanced operations on strings, including routines to parse dates and other structured text.

110302 (Wheres Waldorf)

Find words in a grid a letters. What is the easiest way to write a comparison function for all eight directions?

110304 (Crypt Kicker II)

Solve a substition cipher via a known plain text attack. How do we identify what the plaintext sentence is?

110306 (File Fragmentation)

Put together a collection of broken copies of a given text string. Which pair of fragments go together? How can we nd the right order of the pair?

110307 (Doublets)
Build word ladders on a dictionary of strings. How do we represent and traverse the underlying graph? (if necessary, look ahead to Chapter 9)

CSHP Template For Small Construction Projects
50% (4)
CSHP Template For Small Construction Projects
5 pages
Chapter 2 - Do A Usability Test Now - 2012 - Observing The User Experience
No ratings yet
Chapter 2 - Do A Usability Test Now - 2012 - Observing The User Experience
9 pages
Parts Catalog: TJ053E-AS50
No ratings yet
Parts Catalog: TJ053E-AS50
14 pages
11 String Handling
No ratings yet
11 String Handling
32 pages
Lesson
No ratings yet
Lesson
14 pages
CS 106B Lecture 3: C++ Strings: Friday, September 30, 2016
No ratings yet
CS 106B Lecture 3: C++ Strings: Friday, September 30, 2016
37 pages
Strings in C++
No ratings yet
Strings in C++
5 pages
Fall 23-24lecture 6 String
No ratings yet
Fall 23-24lecture 6 String
13 pages
Lec13 String
No ratings yet
Lec13 String
43 pages
Lec-10
No ratings yet
Lec-10
25 pages
04 Strings
No ratings yet
04 Strings
31 pages
Lecture 7
No ratings yet
Lecture 7
17 pages
C Strings
No ratings yet
C Strings
31 pages
The C++ Programming Skills That Should Be Acquired in This Lab
No ratings yet
The C++ Programming Skills That Should Be Acquired in This Lab
9 pages
ENG2139 Lecture 4
No ratings yet
ENG2139 Lecture 4
35 pages
Strings: Dept. of Computer Science Faculty of Science and Technology
No ratings yet
Strings: Dept. of Computer Science Faculty of Science and Technology
13 pages
Eee0115 7
No ratings yet
Eee0115 7
13 pages
C Strings
No ratings yet
C Strings
31 pages
Wa0002
No ratings yet
Wa0002
31 pages
If You Wish To Include A Double Quote Inside The String, That Can Be Done by Escaping It With A Backslash
No ratings yet
If You Wish To Include A Double Quote Inside The String, That Can Be Done by Escaping It With A Backslash
8 pages
DSA Assignment 01
No ratings yet
DSA Assignment 01
15 pages
C++ Strings
No ratings yet
C++ Strings
34 pages
Lab # 9 String
No ratings yet
Lab # 9 String
4 pages
Sas14 Bes043
No ratings yet
Sas14 Bes043
5 pages
Slides8 Strings Nup
No ratings yet
Slides8 Strings Nup
11 pages
Chapter 3 Strings in C++ Programming
No ratings yet
Chapter 3 Strings in C++ Programming
52 pages
Strings in C Language
No ratings yet
Strings in C Language
28 pages
Dsa 01
No ratings yet
Dsa 01
11 pages
J02a JavaCharsStrings
No ratings yet
J02a JavaCharsStrings
36 pages
C++ Programming TI00AA50: Jarkko - Vuori@metropolia - Fi
No ratings yet
C++ Programming TI00AA50: Jarkko - Vuori@metropolia - Fi
13 pages
String Handling in C++
No ratings yet
String Handling in C++
6 pages
String Structure
No ratings yet
String Structure
25 pages
Programming in C #3: Characters and Strings
No ratings yet
Programming in C #3: Characters and Strings
24 pages
Strings in C++: The Class
No ratings yet
Strings in C++: The Class
29 pages
Programming Paradigms PP - Module2
No ratings yet
Programming Paradigms PP - Module2
33 pages
Functions of C++
No ratings yet
Functions of C++
12 pages
Experiment 1: University of Engineering and Technology, Taxila
No ratings yet
Experiment 1: University of Engineering and Technology, Taxila
11 pages
Computer Programming Basics
No ratings yet
Computer Programming Basics
25 pages
Module 2 (Data Types)
No ratings yet
Module 2 (Data Types)
97 pages
Lecture 17
No ratings yet
Lecture 17
16 pages
Bpops103 Module4
No ratings yet
Bpops103 Module4
47 pages
Chapter 10
No ratings yet
Chapter 10
40 pages
Week 5 - Char Arrays Vs Strings
No ratings yet
Week 5 - Char Arrays Vs Strings
10 pages
DS Necessary
No ratings yet
DS Necessary
14 pages
String (Computer Science) - Wikipedia
No ratings yet
String (Computer Science) - Wikipedia
16 pages
Lecture - 7
No ratings yet
Lecture - 7
5 pages
C++ Mam
No ratings yet
C++ Mam
9 pages
Strings - The Basics
No ratings yet
Strings - The Basics
51 pages
String in C++
No ratings yet
String in C++
18 pages
06 Arrays and String
No ratings yet
06 Arrays and String
30 pages
Chapter 7 Enumerated Types and Strings Types: Lecturer: Mrs Rohani Hassan
No ratings yet
Chapter 7 Enumerated Types and Strings Types: Lecturer: Mrs Rohani Hassan
19 pages
C Strings
No ratings yet
C Strings
18 pages
Formal Project - Stings in C
100% (1)
Formal Project - Stings in C
31 pages
3.string Vector
No ratings yet
3.string Vector
37 pages
Fundamentals of Characters and Strings
No ratings yet
Fundamentals of Characters and Strings
19 pages
Week 13-14
No ratings yet
Week 13-14
21 pages
Lecture 01
No ratings yet
Lecture 01
28 pages
Chapter 10 Strings
No ratings yet
Chapter 10 Strings
40 pages
Declaring and Initializing: // Str4 Is Constructed by 7 Characters of Str1 Starting From 8th Character
No ratings yet
Declaring and Initializing: // Str4 Is Constructed by 7 Characters of Str1 Starting From 8th Character
5 pages
Chapter04 String
No ratings yet
Chapter04 String
27 pages
User Defined Ordinal Type
No ratings yet
User Defined Ordinal Type
8 pages
01 KM 072010004930012
No ratings yet
01 KM 072010004930012
174 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Pusat Tuisyen Bestari Ilmu Science Form 3 TEST 1 (2012) : NAME: - MARKS
No ratings yet
Pusat Tuisyen Bestari Ilmu Science Form 3 TEST 1 (2012) : NAME: - MARKS
5 pages
Pusat Tuisyen Bestari Ilmu: Kemahiran Hidup
No ratings yet
Pusat Tuisyen Bestari Ilmu: Kemahiran Hidup
1 page
IGCSE Chem Ch1 Questions
100% (3)
IGCSE Chem Ch1 Questions
2 pages
Show Me The Numbers - Stephen Few - 555
100% (5)
Show Me The Numbers - Stephen Few - 555
128 pages
Pusat Tuisyen Bestari Ilmu: Intensive Upsr
No ratings yet
Pusat Tuisyen Bestari Ilmu: Intensive Upsr
2 pages
Mid Term
No ratings yet
Mid Term
10 pages
Data Structures: Steven Skiena
No ratings yet
Data Structures: Steven Skiena
25 pages
Chemistry Pahang 2011
No ratings yet
Chemistry Pahang 2011
0 pages
Getting Started: Steven Skiena
No ratings yet
Getting Started: Steven Skiena
15 pages
Computational Geometry: Steven Skiena
No ratings yet
Computational Geometry: Steven Skiena
19 pages
Grids: Steven Skiena
No ratings yet
Grids: Steven Skiena
19 pages
Sukatan Pelajaran Maths T STPM
No ratings yet
Sukatan Pelajaran Maths T STPM
23 pages
Mult by 25
No ratings yet
Mult by 25
1 page
Chapter 2 - Classification of Business
No ratings yet
Chapter 2 - Classification of Business
22 pages
Unit 3 Banking Law
No ratings yet
Unit 3 Banking Law
24 pages
Steam Calculators - Heat Loss Calculator
No ratings yet
Steam Calculators - Heat Loss Calculator
1 page
Francisco Padilla 1
No ratings yet
Francisco Padilla 1
2 pages
MX SB RO: User Manual
No ratings yet
MX SB RO: User Manual
23 pages
JSR-211 - Devx
No ratings yet
JSR-211 - Devx
6 pages
Advert Receptionist Intern
No ratings yet
Advert Receptionist Intern
1 page
IA Carpentry
No ratings yet
IA Carpentry
103 pages
Elka 43 Instructions
No ratings yet
Elka 43 Instructions
5 pages
Library Jit Final Handout
No ratings yet
Library Jit Final Handout
49 pages
Manual Polipasto R&M Load Mate LM16
100% (1)
Manual Polipasto R&M Load Mate LM16
65 pages
Ndoro and Another V Conjugal Enterprises (Private) Limited and Another (814 of 2022) 2022 ZWHHC 814 (16 November 2022)
No ratings yet
Ndoro and Another V Conjugal Enterprises (Private) Limited and Another (814 of 2022) 2022 ZWHHC 814 (16 November 2022)
7 pages
An Open Ended Contract
No ratings yet
An Open Ended Contract
5 pages
Law Assignment (Final)
No ratings yet
Law Assignment (Final)
10 pages
KPCSW Report.2022
No ratings yet
KPCSW Report.2022
43 pages
Developing Models of Managerial Competencies of Managers: A Review
No ratings yet
Developing Models of Managerial Competencies of Managers: A Review
15 pages
Anti Ragging Affidavit
No ratings yet
Anti Ragging Affidavit
1 page
Industrial Internship Report ON Fundamental Analysis of Indian Steel Industry
No ratings yet
Industrial Internship Report ON Fundamental Analysis of Indian Steel Industry
60 pages
ID Strategi Pengembangan Cabai Keriting Di
100% (1)
ID Strategi Pengembangan Cabai Keriting Di
12 pages
V003t07a004 88 GT 249
100% (1)
V003t07a004 88 GT 249
12 pages
The Writer Vol.129 N 09 (September 2016)
No ratings yet
The Writer Vol.129 N 09 (September 2016)
54 pages
SQAP For Starter or Control Panel
No ratings yet
SQAP For Starter or Control Panel
29 pages
K.1.1 Sisters and Brothers (Social Studies)
No ratings yet
K.1.1 Sisters and Brothers (Social Studies)
10 pages
Playwright JS Course Content
No ratings yet
Playwright JS Course Content
10 pages
Nocom vs. Camerino
0% (1)
Nocom vs. Camerino
7 pages
F-22 Paper Model Template Craft
No ratings yet
F-22 Paper Model Template Craft
1 page
Cold Working of Metals 2997
No ratings yet
Cold Working of Metals 2997
7 pages

Strings: Steven Skiena

Uploaded by

Strings: Steven Skiena

Uploaded by

Lecture 3: Strings Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 117944400 http://www.cs.sunysb.

Which String Representation?

Searching for Patterns

/* counters */ /* string lengths */

C String Library Functions

int toupper(int c); int tolower(int c);

These appear in the C language string library string.h.

C++ String Library Functions

/* search left or right for the given string */

/* get characters, also there are iterators */

Overloaded operators exist for concatenation and string comparison.

Java String Objects

110302 (Wheres Waldorf)

110304 (Crypt Kicker II)

110306 (File Fragmentation)

You might also like

/* counters / / string lengths */