0% found this document useful (0 votes)
2 views54 pages

Lecture 09 TextProcessinginJava

This document covers text processing in Java, focusing on the String and Character classes, as well as the StringBuilder class for mutable strings. It details methods for string manipulation, comparison, and regular expressions for pattern matching. The document also explains the immutability of strings and the use of type-wrapper classes for primitive types.

Uploaded by

esin tan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views54 pages

Lecture 09 TextProcessinginJava

This document covers text processing in Java, focusing on the String and Character classes, as well as the StringBuilder class for mutable strings. It details methods for string manipulation, comparison, and regular expressions for pattern matching. The document also explains the immutability of strings and the use of type-wrapper classes for primitive types.

Uploaded by

esin tan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Text Processing in Java

Chapter 14: Strings, Characters and Regular Expressions


Java How to Program, 10/e

© Copyright 1992-2015 by Pearson Education, Inc. All Rights


Reserved.
Objectives
• Review char, Character class and String Class:
• What does it mean for String class to be
immutable?
• Use StringBuilder class to deal with mutable
strings
• Learn about Regular Expressions
• Use regular expressions for matching and
splitting strings.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
14.5 Type-wrapper Classes
Purpose: enable primitive-type values to be treated as
objects:
▪ Boolean, Character, Double, Float, Byte, Short,
Integer and Long
Autoboxing: Automatic conversion between char
literals and Character objects
▪ Also for other primitive types and their respective wrapper
classes.
Most Character methods are static methods
designed for convenience in processing individual
char values.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
14.5 Class Character
A program may contain character literals as in ‘d’.
▪ Characters include letters, digits, punctuation, space, tab, new
line, symbols and others.
▪ Stored as a four byte integer using Unicode.
Method charValue returns the char value stored in
the object.
Method toString returns the String representation
of the char value stored in the object.
Method equals determines if two Characters have
the same contents.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
14.5 Class Character (cont.)
Method isDefined determines whether a character is
defined in the Unicode character set.
Method isDigit determines whether a character is a defined
Unicode digit.
Method isJavaIdentifierStart determines whether a
character can be the first character of an identifier in
Java—that is, a letter, an underscore (_) or a dollar sign ($).
Method isJavaIdentifierPart determine whether a
character can be used in an identifier in Java—that is, a
digit, a letter, an underscore (_) or a dollar sign ($).

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
14.5 Class Character (cont.)
Method isLetter determines whether a character is a letter.
Method isLetterOrDigit determines whether a character is
a letter or a digit.
Method isLowerCase determines whether a character is a
lowercase letter.
Method isUpperCase determines whether a character is an
uppercase letter.
Method toUpperCase converts a character to its
uppercase equivalent.
Method toLowerCase converts a character to its
lowercase equivalent.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
Strings
String class represent immutable strings;
▪ String literals (stored in memory as String objects) are
written as a sequence of characters in double quotation marks.
StringBuilder class represent mutable strings
Both in java.lang package.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
14.3.1 String Constructors
No-argument constructor creates a String that
contains no characters (i.e., the empty string, which can
also be represented as "") and has a length of 0.
Constructor that takes a String object copies the
argument into the new String.
Constructor that takes a char array creates a String
containing a copy of the characters in the array.
Constructor that takes a char array and two integers
creates a String containing the specified portion of
the array.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
Some String Methods
length determines the number of characters in a string.
charAt returns the character at a specific position in the
String.
getChars copies the characters of a String into a
character array.
▪ The first argument is the starting index in the String from which
characters are to be copied.
▪ The second argument is the index that is one past the last character to
be copied from the String.
▪ The third argument is the character array into which the characters
are to be copied.
▪ The last argument is the starting index where the copied characters
are placed in the target character array.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
14.3.3 Comparing Strings
Strings are compared using the numeric codes of the
characters in the strings.
Figure 14.3 demonstrates String methods equals,
equalsIgnoreCase, compareTo and
regionMatches and using the equality operator ==
to compare String objects.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
14.3.3 Comparing Strings (cont.)
Method equals tests any two objects for equality
▪ The method returns true if the contents of the objects are equal, and
false otherwise.
▪ Uses a lexicographical comparison.
When primitive-type values are compared with ==, the
result is true if both values are identical.
When references are compared with ==, the result is true
if both references refer to the same object in memory.
Java treats all string literal objects with the same contents as
one String object to which there can be many references.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
14.3.3 Comparing Strings (cont.)
String methods startsWith and endsWith determine whether
strings start with or end with a particular set of characters.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
14.3.6 Concatenating Strings
String method concat concatenates two String
objects (similar to using the + operator) and returns a
new String object containing the characters from
both original Strings.
The original Strings to which s1 and s2 refer are
not modified.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
14.3.7 Miscellaneous String Methods
Method replace returns a new String object in which every
occurrence of the first char argument is replaced with the
second.
▪ An overloaded version enables you to replace substrings rather than
individual characters.
Method toUpperCase generates a new String with uppercase
letters.
Method toLowerCase returns a new String object with
lowercase letters.
Method trim generates a new String object that removes all
whitespace characters that appear at the beginning or end of the
String on which trim operates.
Method toCharArray creates a new character array containing a
copy of the characters in the String.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
14.4 Class StringBuilder
Class StringBuilder is used to create and manipulate
dynamic string information. (Modifiable strings)
Every StringBuilder is capable of storing a number of
characters specified by its capacity.
If the capacity of a StringBuilder is exceeded, the
capacity expands to accommodate the additional characters.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
14.4.1 StringBuilder
Constructors
No-argument constructor creates a StringBuilder with
no characters in it and an initial capacity of 16 characters.
Constructor that takes an integer argument creates a
StringBuilder with no characters in it and the initial
capacity specified by the integer argument.
Constructor that takes a String argument creates a
StringBuilder containing the characters in the
String argument. The initial capacity is the number of
characters in the String argument plus 16.
Method toString of class StringBuilder returns the
StringBuilder contents as a String.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
14.4.2 StringBuilder Methods length ,
capacity , setLength and ensureCapacity
Methods length and capacity return the number of
characters currently in a StringBuilder and the number
of characters that can be stored in a without allocating more
memory, respectively.
Method ensureCapacity guarantees that a
StringBuilder has at least the specified capacity.
Method setLength increases or decreases the length of a
StringBuilder.
▪ If the specified length is less than the current number of characters,
the buffer is truncated to the specified length.
▪ If the specified length is greater than the number of characters, null
characters are appended until the total number of characters in the
StringBuilder is equal to the specified length.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
14.4.3 StringBuilder Methods
charAt , setCharAt , getChars and
reverse
Method charAt takes an integer argument and returns the
character in the StringBuilder at that index.
Method getChars copies characters from a
StringBuilder into the character array argument.
▪ Four arguments—the starting index from which characters should be
copied, the index one past the last character to be copied, the
character array into which the characters are to be copied and the
starting location in the character array where the first character
should be placed.
Method setCharAt takes an integer and a character
argument and sets the character at the specified position in
the StringBuilder to the character argument.
Method reverse reverses the contents of the
StringBuilder.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
Escape Sequences in Strings
There are special characters that cannot be easily printed.
Tab, newline, etc.
We need a special approach to include special characters in
strings
▪ System.out.println("She said:\n\t\"Hello!\"\n to me.");
She said:
"Hello! “
to me.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
14.6 Tokenizing String s
When you read a sentence, your mind breaks it into
tokens—individual words and punctuation marks that
convey meaning.
Compilers also perform tokenization.
String method split breaks a String into its
component tokens and returns an array of Strings.
Tokens are separated by delimiters
▪ Typically white-space characters such as space, tab, newline
and carriage return.
▪ Other characters can also be used as delimiters to separate
tokens.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
14.7 Regular Expressions, Class
Pattern and Class Matcher
A regular expression is a specially formatted String
that describes a search pattern for matching characters
in other Strings.
Useful for validating input and ensuring that data is in a
particular format.
▪ Validate phone numbers, email or postal addresses,…
▪ Validate file formats
▪ Validate program syntax

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
Any string containing ordinary characters match itself.
There are also special characters with specific meaning.

Since ‘\’ is a special escape character, you must use “\\” to insert a
single backslash into a string!!!

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
14.7 Regular Expressions, Class
Pattern and Class Matcher (cont.)
To match a set of characters use special characters[]-^.
▪ "[aeiou]" matches any single vowel.
▪ "[A-Y]" matches any single uppercase letter except for Z.
▪ "[A-z]" matches all characters (such as [ and \) with an integer value
between uppercase A and lowercase z.
▪ "[A-Za-z]" matches all uppercase and lowercase letters.
▪ If the first character in the brackets is "^", the expression accepts
any character other than those indicated.
● "[^Z]" matches any character other than capital Z, including lowercase
letters and nonletters such as \n

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
14.7 Regular Expressions, Class
Pattern and Class Matcher (cont.)
String method matches receives a String that
specifies the regular expression and matches the
contents of the String object on which it’s called to
the regular expression.
▪ The method returns a boolean indicating whether the match
succeeded.
A regular expression consists of literal characters and
special symbols.

© Copyright 1992-2015 by Pearson


Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.
© Copyright 1992-2015 by Pearson
Education, Inc. All Rights Reserved.

You might also like