Deitel & Associates, Inc. Logo

Back to www.deitel.com
digg.png delicious.png blinkit.png furl.png
Internet & World Wide Web How to Program, 3/e
Internet & World Wide Web How to Program, 3/e

ISBN:
0-13-145091-3
© 2004
pages: 1420

Order
Amazon logo

This tutorial continues our introduction to Python with basic string input and output capabilities, and an introduction to regular expression processing with the Python re module. This tutorial is intended for students and developers who are already familiar with basic Python programming or who have read our prior Python tutorials (see the list at the bottom of this page).
Download the Code Examples
[Note: This tutorial is an excerpt (Section 35.4) of Chapter 35, Python, from our textbook Internet & World Wide Web How to Program, 3/e. This tutorial may refer to other chapters or sections of the book that are not included here. Permission Information: Deitel, Harvey M. and Paul J., INTERNET & WORLD WIDE WEB HOW TO PROGRAM, 3/E, 2004, pp.1254-1259. Electronically reproduced by permission of Pearson Education, Inc., Upper Saddle River, New Jersey.]
35.4   String Processing and Regular Expressions (Continued)
Figure 35.13 lists the most popular regular expression symbols recognized by the re module. Unless otherwise specified, regular expression characters * and + match as many occurrences of a pattern as possible. For example, the regular expression hel*o matches strings that have the letters he, followed by any number of l's, followed by an o (e.g., "heo", "helo", "hello", "helllo">).
Fig. 35.13 re module's regular expression characters.
Character
Matches
^
Beginning of string.
$
End of string.
.
Any character, except a newline.
*
Zero or more occurrences of the pattern.
+
One or more occurrences of the preceding pattern.
?
Zero or one occurrence of the preceding pattern.
{m, n}
Between m and n occurrences of the preceding pattern.
\b
Word boundary (i.e., the beginning or end of a word).
\B
Nonword boundary.
\d
Digit ([0-9]).
\D
Nondigit.
\w
Any alpha-numeric character.
[...]
Any character defined by the set.
[^...]
Any character not defined by the set.
Lines 9-12 of Fig. 35.12 use a few of these symbols to compile four regular expression patterns. The expression in line 9 (expression2) matches the string "Test" at the beginning of a line. The expression in line 10 (expression3) matches the string "Test" at the end of a line. The expression in line 11 (expression4) matches a word that ends with "es". The expression in line 12 (expression5) matches the letter t, followed by a vowel. Line 12 illustrates the optional second argument that function compile may take. This argument is a flag that describes how the regular expression will be used when the expression is matched against a string. The re.I flag means that case is ignored when using the regular expression to process a string.
The r character before each string in lines 8-12 indicates that the string is a raw string. Python handles backslash characters in raw strings differently than in "normal" strings. Specifically, Python does not interpret backslashes as escape characters. Writing all regular expressions as raw strings can help programmers avoid writing regular expressions that may be interpreted in a way that was not intended. For example, without the raw-string character, the regular expression string in line 11 would have to be written as \\b\\w*es\\b, because \b is a backspace to Python, but a word boundary in regular expressions.
Line 14 uses the SRE_Pattern's search method to test searchString against the regular expression expression1. The search method returns an SRE_Match object. If search does not find any matching substrings, it returns None. None is a Python type whose value indicates that no value exists. In a Python ifstatement, None evaluates to false; therefore, we only need to test the return value to determine whether any matches were found. If a match is found, we print an appropriate message.
Line 17 uses SRE_Pattern's match method to test searchString against regular expression expression2. The match method returns an SRE_Match object only if the string matches the pattern exactly.
Line 23 uses SRE_Pattern's findall method to store in variable result a list of all the substrings in searchString that match the regular expression expression4. If findall returns any matches, we print a message that indicates how many words were found (lines 25-27) by using Python function len. When run on a list, function len returns the number of elements in the list. Lines 29-30 print each item in the list, followed by a space.
Lines 33-40 perform similar processing with expression5 to print all the substrings in searchString that match the pattern of the letter t followed by a vowel. Remember that expression5 was compiled using the re.I flag. Thus the letter t or the vowels in searchString can be either lowercase or uppercase. We end the program by printing a new line.
 
Pages in this Tutorial:   1 | 2 | 3 | 4
 
Additional Python Tutorials:
Introduction to Python

Python Basic Data Types, Control Statements and Functions

Tuples, Lists and Dictionaries

Python CGI Programming

Return to Tutorial Index