0% found this document useful (0 votes)
7 views

regex_patterns_and_syntax

The document provides an overview of regular expressions (regex) in Python, detailing basic syntax, character classes, anchors, groups, and practical examples. It explains various regex components such as `.` for matching any character, `^` for the start of a string, and `` for word boundaries. Additionally, it includes practical examples for extracting emails, finding dates, and validating phone numbers.

Uploaded by

Samit Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

regex_patterns_and_syntax

The document provides an overview of regular expressions (regex) in Python, detailing basic syntax, character classes, anchors, groups, and practical examples. It explains various regex components such as `.` for matching any character, `^` for the start of a string, and `` for word boundaries. Additionally, it includes practical examples for extracting emails, finding dates, and validating phone numbers.

Uploaded by

Samit Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Regular Expressions (Regex) in Python

### Basic Syntax

- **`.`**: Matches any single character except a newline.

- Example: `a.b` matches `aab`, `acb`, but not `a\nb`.

- **`^`**: Matches the start of a string.

- Example: `^abc` matches `abc` only if it is at the beginning of the string.

- **`$`**: Matches the end of a string.

- Example: `abc$` matches `abc` only if it is at the end of the string.

- **`*`**: Matches 0 or more repetitions of the preceding element.

- Example: `ab*c` matches `ac`, `abc`, `abbc`, etc.

- **`+`**: Matches 1 or more repetitions of the preceding element.

- Example: `ab+c` matches `abc`, `abbc`, but not `ac`.

- **`?`**: Matches 0 or 1 repetition of the preceding element.

- Example: `ab?c` matches `ac` and `abc`, but not `abbc`.

- **`{m,n}`**: Matches between `m` and `n` repetitions of the preceding element.

- Example: `a{2,4}` matches `aa`, `aaa`, and `aaaa`.


### Character Classes

- **`[...]`**: Matches any one of the characters inside the brackets.

- Example: `[abc]` matches `a`, `b`, or `c`.

- **`[^...]`**: Matches any character not inside the brackets.

- Example: `[^abc]` matches any character except `a`, `b`, or `c`.

- **`\d`**: Matches any digit (equivalent to `[0-9]`).

- Example: `\d` matches `1`, `2`, `3`, etc.

- **`\D`**: Matches any non-digit.

- Example: `\D` matches `a`, `b`, `!`, etc.

- **`\w`**: Matches any word character (alphanumeric + underscore, equivalent to `[A-Za-z0-9_]`).

- Example: `\w` matches `a`, `1`, `_`, etc.

- **`\W`**: Matches any non-word character.

- Example: `\W` matches `!`, `@`, `#`, etc.

- **`\s`**: Matches any whitespace character (spaces, tabs, newlines).

- Example: `\s` matches ` `, `\t`, `\n`, etc.

- **`\S`**: Matches any non-whitespace character.

- Example: `\S` matches `a`, `1`, `!`, etc.


### Anchors

- **`\b`**: Matches a word boundary.

- Example: `\bword\b` matches `word` but not `sword` or `words`.

- **`\B`**: Matches a non-word boundary.

- Example: `\Bword\B` matches `swordfish` but not `word` or `sword`.

### Groups and Alternations

- **`(...)`**: Groups a pattern together.

- Example: `(abc)+` matches `abc`, `abcabc`, `abcabcabc`, etc.

- **`|`**: Matches either the pattern before or the pattern after the `|`.

- Example: `a|b` matches `a` or `b`.

### Escaped Characters

- **`\`**: Escapes a special character, making it literal.

- Example: `\.` matches `.` instead of any character.

### Lookahead and Lookbehind

- **`(?=...)`**: Positive lookahead assertion, matches a group if it is followed by a certain pattern.

- Example: `a(?=b)` matches `a` in `ab`, but not in `ac`.


- **`(?!...)`**: Negative lookahead assertion, matches a group if it is not followed by a certain pattern.

- Example: `a(?!b)` matches `a` in `ac`, but not in `ab`.

- **`(?<=...)`**: Positive lookbehind assertion, matches a group if it is preceded by a certain pattern.

- Example: `(?<=b)a` matches `a` in `ba`, but not in `ca`.

- **`(?<!...)`**: Negative lookbehind assertion, matches a group if it is not preceded by a certain

pattern.

- Example: `(?<!b)a` matches `a` in `ca`, but not in `ba`.

### Flags

- **`re.IGNORECASE` or `re.I`**: Ignore case.

- Example: `re.search('abc', 'ABC', re.I)` matches.

- **`re.MULTILINE` or `re.M`**: Make `^` and `$` match the start and end of each line.

- Example: `re.search('^abc', 'abc\ndef', re.M)` matches.

- **`re.DOTALL` or `re.S`**: Make `.` match any character including newlines.

- Example: `re.search('a.b', 'a\nb', re.S)` matches.

### Practical Examples

1. **Extracting Email Addresses**:

```python

import re
text = "Please contact us at [email protected] for assistance."

emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)

print(emails) # Output: ['[email protected]']

```

2. **Finding Dates in a Text**:

```python

text = "The event is scheduled for 2023-08-10."

dates = re.findall(r'\b\d{4}-\d{2}-\d{2}\b', text)

print(dates) # Output: ['2023-08-10']

```

3. **Replacing Multiple Whitespace Characters with a Single Space**:

```python

text = "This is an example text."

clean_text = re.sub(r'\s+', ' ', text)

print(clean_text) # Output: "This is an example text."

```

4. **Validating a Phone Number**:

```python

phone = "123-456-7890"

if re.match(r'^\d{3}-\d{3}-\d{4}$', phone):

print("Valid phone number")

else:

print("Invalid phone number")


```

### Summary

Regular expressions provide a powerful way to search, match, and manipulate strings based on

specific patterns. By understanding these patterns and syntax, you can perform complex text

processing tasks efficiently.

You might also like