Regular Expressions (Regex) in Python
### Basic Syntax
- **`.`**: Matches any single character except a newline.
- Example: `a.b` matches `aab`, `acb`, but not `a\nb`.
- **`^`**: Matches the start of a string.
- Example: `^abc` matches `abc` only if it is at the beginning of the string.
- **`$`**: Matches the end of a string.
- Example: `abc$` matches `abc` only if it is at the end of the string.
- **`*`**: Matches 0 or more repetitions of the preceding element.
- Example: `ab*c` matches `ac`, `abc`, `abbc`, etc.
- **`+`**: Matches 1 or more repetitions of the preceding element.
- Example: `ab+c` matches `abc`, `abbc`, but not `ac`.
- **`?`**: Matches 0 or 1 repetition of the preceding element.
- Example: `ab?c` matches `ac` and `abc`, but not `abbc`.
- **`{m,n}`**: Matches between `m` and `n` repetitions of the preceding element.
- Example: `a{2,4}` matches `aa`, `aaa`, and `aaaa`.
### Character Classes
- **`[...]`**: Matches any one of the characters inside the brackets.
- Example: `[abc]` matches `a`, `b`, or `c`.
- **`[^...]`**: Matches any character not inside the brackets.
- Example: `[^abc]` matches any character except `a`, `b`, or `c`.
- **`\d`**: Matches any digit (equivalent to `[0-9]`).
- Example: `\d` matches `1`, `2`, `3`, etc.
- **`\D`**: Matches any non-digit.
- Example: `\D` matches `a`, `b`, `!`, etc.
- **`\w`**: Matches any word character (alphanumeric + underscore, equivalent to `[A-Za-z0-9_]`).
- Example: `\w` matches `a`, `1`, `_`, etc.
- **`\W`**: Matches any non-word character.
- Example: `\W` matches `!`, `@`, `#`, etc.
- **`\s`**: Matches any whitespace character (spaces, tabs, newlines).
- Example: `\s` matches ` `, `\t`, `\n`, etc.
- **`\S`**: Matches any non-whitespace character.
- Example: `\S` matches `a`, `1`, `!`, etc.
### Anchors
- **`\b`**: Matches a word boundary.
- Example: `\bword\b` matches `word` but not `sword` or `words`.
- **`\B`**: Matches a non-word boundary.
- Example: `\Bword\B` matches `swordfish` but not `word` or `sword`.
### Groups and Alternations
- **`(...)`**: Groups a pattern together.
- Example: `(abc)+` matches `abc`, `abcabc`, `abcabcabc`, etc.
- **`|`**: Matches either the pattern before or the pattern after the `|`.
- Example: `a|b` matches `a` or `b`.
### Escaped Characters
- **`\`**: Escapes a special character, making it literal.
- Example: `\.` matches `.` instead of any character.
### Lookahead and Lookbehind
- **`(?=...)`**: Positive lookahead assertion, matches a group if it is followed by a certain pattern.
- Example: `a(?=b)` matches `a` in `ab`, but not in `ac`.
- **`(?!...)`**: Negative lookahead assertion, matches a group if it is not followed by a certain pattern.
- Example: `a(?!b)` matches `a` in `ac`, but not in `ab`.
- **`(?<=...)`**: Positive lookbehind assertion, matches a group if it is preceded by a certain pattern.
- Example: `(?<=b)a` matches `a` in `ba`, but not in `ca`.
- **`(?<!...)`**: Negative lookbehind assertion, matches a group if it is not preceded by a certain
pattern.
- Example: `(?<!b)a` matches `a` in `ca`, but not in `ba`.
### Flags
- **`re.IGNORECASE` or `re.I`**: Ignore case.
- Example: `re.search('abc', 'ABC', re.I)` matches.
- **`re.MULTILINE` or `re.M`**: Make `^` and `$` match the start and end of each line.
- Example: `re.search('^abc', 'abc\ndef', re.M)` matches.
- **`re.DOTALL` or `re.S`**: Make `.` match any character including newlines.
- Example: `re.search('a.b', 'a\nb', re.S)` matches.
### Practical Examples
1. **Extracting Email Addresses**:
```python
import re
text = "Please contact us at [email protected] for assistance."
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
```
2. **Finding Dates in a Text**:
```python
text = "The event is scheduled for 2023-08-10."
dates = re.findall(r'\b\d{4}-\d{2}-\d{2}\b', text)
print(dates) # Output: ['2023-08-10']
```
3. **Replacing Multiple Whitespace Characters with a Single Space**:
```python
text = "This is an example text."
clean_text = re.sub(r'\s+', ' ', text)
print(clean_text) # Output: "This is an example text."
```
4. **Validating a Phone Number**:
```python
phone = "123-456-7890"
if re.match(r'^\d{3}-\d{3}-\d{4}$', phone):
print("Valid phone number")
else:
print("Invalid phone number")
```
### Summary
Regular expressions provide a powerful way to search, match, and manipulate strings based on
specific patterns. By understanding these patterns and syntax, you can perform complex text
processing tasks efficiently.