This tutorial presents Java powerful regular-expression processing capabilities using class Pattern, class Matcher and class String's matches method. This tutorial is intended for students and developers who are familiar with basic Java string-processing techniques.
Download the code for this tutorial here.
[Note: This tutorial is an excerpt (Section 29.7) of Chapter 29, Strings, Characters and Regular Expressions, from our textbook Java How to Program, 6/e. This tutorial may refer to other chapters or sections of the book that are not included here. Permission Information: Deitel, Harvey M. and Paul J., JAVA HOW TO PROGRAM, ©2005, pp.1378-1387. Electronically reproduced by permission of Pearson Education, Inc., Upper Saddle River, New Jersey.]
29.7 Regular Expressions, Class Pattern and Class MatcherRegular expressions are sequences of characters and symbols that define a set of strings. They are useful for validating input and ensuring that data is in a particular format. For example, a ZIP code must consist of five digits, and a last name must contain only letters, spaces, apostrophes and hyphens. One application of regular expressions is to facilitate the construction of a compiler. Often, a large and complex regular expression is used to validate the syntax of a program. If the program code does not match the regular expression, the compiler knows that there is a syntax error within the code.
Class String provides several methods for performing regular-expression operations, the simplest of which is the matching operation. String method matches receives a string that specifies the regular expression and matches the contents of the String object on which it is called to the regular expression. The method returns a boolean indicating whether the match succeeded. A regular expression consists of literal characters and special symbols. Figure 29.19 specifies some predefined character classes that can be used with regular expressions. A character class is an escape sequence that represents a group of characters. A digit is any numeric character. A word character is any letter (uppercase or lowercase), any digit or the underscore character. A whitespace character is a space, a tab, a carriage return, a newline or a form feed. Each character class matches a single character in the string we are attempting to match with the regular expression.
| Character | Matches | Character | Matches |
| \d | any digit | \D | any non-digit |
| \w | any word character | \W | any non-word character |
| \s | any whitespace | \S | any non-whitespace |
Regular expressions are not limited to these predefined character classes. The expressions employ various operators and other forms of notation to match complex patterns. We examine several of these techniques in the application in Fig. 29.20 and Fig. 29.21 which validates user input via regular expressions. [Note: This application is not designed to match all possible valid user input.]
|
||
| Fig. 29.20 | Validating user information using regular expressions. | |

