0% found this document useful (0 votes)
2 views

regex_patterns_and_syntax

The document provides an overview of regular expressions (regex) in Python, detailing basic syntax, character classes, anchors, groups, and practical examples. It explains various regex components such as `.` for matching any character, `^` for the start of a string, and `` for word boundaries. Additionally, it includes practical examples for extracting emails, finding dates, and validating phone numbers.

Uploaded by

Samit Mishra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

regex_patterns_and_syntax

The document provides an overview of regular expressions (regex) in Python, detailing basic syntax, character classes, anchors, groups, and practical examples. It explains various regex components such as `.` for matching any character, `^` for the start of a string, and `` for word boundaries. Additionally, it includes practical examples for extracting emails, finding dates, and validating phone numbers.

Uploaded by

Samit Mishra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Regular Expressions (Regex) in Python

### Basic Syntax

- **`.`**: Matches any single character except a newline.

- Example: `a.b` matches `aab`, `acb`, but not `a\nb`.

- **`^`**: Matches the start of a string.

- Example: `^abc` matches `abc` only if it is at the beginning of the string.

- **`$`**: Matches the end of a string.

- Example: `abc$` matches `abc` only if it is at the end of the string.

- **`*`**: Matches 0 or more repetitions of the preceding element.

- Example: `ab*c` matches `ac`, `abc`, `abbc`, etc.

- **`+`**: Matches 1 or more repetitions of the preceding element.

- Example: `ab+c` matches `abc`, `abbc`, but not `ac`.

- **`?`**: Matches 0 or 1 repetition of the preceding element.

- Example: `ab?c` matches `ac` and `abc`, but not `abbc`.

- **`{m,n}`**: Matches between `m` and `n` repetitions of the preceding element.

- Example: `a{2,4}` matches `aa`, `aaa`, and `aaaa`.


### Character Classes

- **`[...]`**: Matches any one of the characters inside the brackets.

- Example: `[abc]` matches `a`, `b`, or `c`.

- **`[^...]`**: Matches any character not inside the brackets.

- Example: `[^abc]` matches any character except `a`, `b`, or `c`.

- **`\d`**: Matches any digit (equivalent to `[0-9]`).

- Example: `\d` matches `1`, `2`, `3`, etc.

- **`\D`**: Matches any non-digit.

- Example: `\D` matches `a`, `b`, `!`, etc.

- **`\w`**: Matches any word character (alphanumeric + underscore, equivalent to `[A-Za-z0-9_]`).

- Example: `\w` matches `a`, `1`, `_`, etc.

- **`\W`**: Matches any non-word character.

- Example: `\W` matches `!`, `@`, `#`, etc.

- **`\s`**: Matches any whitespace character (spaces, tabs, newlines).

- Example: `\s` matches ` `, `\t`, `\n`, etc.

- **`\S`**: Matches any non-whitespace character.

- Example: `\S` matches `a`, `1`, `!`, etc.


### Anchors

- **`\b`**: Matches a word boundary.

- Example: `\bword\b` matches `word` but not `sword` or `words`.

- **`\B`**: Matches a non-word boundary.

- Example: `\Bword\B` matches `swordfish` but not `word` or `sword`.

### Groups and Alternations

- **`(...)`**: Groups a pattern together.

- Example: `(abc)+` matches `abc`, `abcabc`, `abcabcabc`, etc.

- **`|`**: Matches either the pattern before or the pattern after the `|`.

- Example: `a|b` matches `a` or `b`.

### Escaped Characters

- **`\`**: Escapes a special character, making it literal.

- Example: `\.` matches `.` instead of any character.

### Lookahead and Lookbehind

- **`(?=...)`**: Positive lookahead assertion, matches a group if it is followed by a certain pattern.

- Example: `a(?=b)` matches `a` in `ab`, but not in `ac`.


- **`(?!...)`**: Negative lookahead assertion, matches a group if it is not followed by a certain pattern.

- Example: `a(?!b)` matches `a` in `ac`, but not in `ab`.

- **`(?<=...)`**: Positive lookbehind assertion, matches a group if it is preceded by a certain pattern.

- Example: `(?<=b)a` matches `a` in `ba`, but not in `ca`.

- **`(?<!...)`**: Negative lookbehind assertion, matches a group if it is not preceded by a certain

pattern.

- Example: `(?<!b)a` matches `a` in `ca`, but not in `ba`.

### Flags

- **`re.IGNORECASE` or `re.I`**: Ignore case.

- Example: `re.search('abc', 'ABC', re.I)` matches.

- **`re.MULTILINE` or `re.M`**: Make `^` and `$` match the start and end of each line.

- Example: `re.search('^abc', 'abc\ndef', re.M)` matches.

- **`re.DOTALL` or `re.S`**: Make `.` match any character including newlines.

- Example: `re.search('a.b', 'a\nb', re.S)` matches.

### Practical Examples

1. **Extracting Email Addresses**:

```python

import re
text = "Please contact us at [email protected] for assistance."

emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)

print(emails) # Output: ['[email protected]']

```

2. **Finding Dates in a Text**:

```python

text = "The event is scheduled for 2023-08-10."

dates = re.findall(r'\b\d{4}-\d{2}-\d{2}\b', text)

print(dates) # Output: ['2023-08-10']

```

3. **Replacing Multiple Whitespace Characters with a Single Space**:

```python

text = "This is an example text."

clean_text = re.sub(r'\s+', ' ', text)

print(clean_text) # Output: "This is an example text."

```

4. **Validating a Phone Number**:

```python

phone = "123-456-7890"

if re.match(r'^\d{3}-\d{3}-\d{4}$', phone):

print("Valid phone number")

else:

print("Invalid phone number")


```

### Summary

Regular expressions provide a powerful way to search, match, and manipulate strings based on

specific patterns. By understanding these patterns and syntax, you can perform complex text

processing tasks efficiently.

You might also like