0% found this document useful (0 votes)

32 views23 pages

GitHub - Cyrilbois - You-Should-Learn-Regex - Regular Expresion Tutorial (Blog - Patricktriest.com) Source Code

This document is a README file for a GitHub repository that provides a tutorial on learning regular expressions (regex). The tutorial contains examples of basic regex patterns for matching numbers in a text file, and code samples demonstrating how to perform regex searches on text files in 16 different programming languages, including JavaScript, Python, R, Ruby, Haskell, Perl, PHP, Go, Java, Kotlin and Scala.

Uploaded by

mene1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views23 pages

GitHub - Cyrilbois - You-Should-Learn-Regex - Regular Expresion Tutorial (Blog - Patricktriest.com) Source Code

Uploaded by

mene1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.

com/cyrilbois/You-Should-Learn-Regex

forked from triestpa/You-Should-Learn-Regex

Regular Expresion Tutorial (blog.patricktriest.com) Source Code

blog.patricktriest.com/you-should-learn-regex/

0 stars 13 forks

Code Pull requests Actions Projects Security Insights

This branch is 1 commit ahead of triestpa:master. Pull request Compare

cyrilbois Added regex visualizer on 10 Jan 44

View code

README.md

Regular Expressions (Regex): One of the most powerful, widely applicable, and
sometimes intimidating techniques in software engineering. From validating email
addresses to performing complex code refactors, regular expressions have a wide
range of uses and are an essential entry in any software engineer's toolbox.

What is a regular expression?

A regular expression (or regex, or regexp) is a way to describe complex search

patterns using sequences of characters.

The complexity of the specialized regex syntax, however, can make these expressions
somewhat inaccessible. For instance, here is a basic regex that describes any time in
the 24-hour HH/MM format.

\b([01]?[0-9]|2[0-3]):([0-5]\d)\b

If this looks complex to you now, don't worry, by the time we finish the tutorial
understanding this expression will be trivial.

1 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

Learn once, write anywhere

Regular expressions can be used in virtually any programming language. A

knowledge of regex is very useful for validating user input, interacting with the Unix
shell, searching/refactoring code in your favorite text editor, performing database
text searches, and lots more.

In this tutorial, I'll attempt to give an provide an approachable introduction to regex

syntax and usage in a variety of scenarios, languages, and environments.

This web application is my favorite tool for building, testing, and debugging regular
expressions. I highly recommend that you use it to test out the expressions that we'll
cover in this tutorial.

The source code for the examples in this tutorial can be found at the Github
repository here - https://github.com/triestpa/You-Should-Learn-Regex

We'll start with a very simple example - Match any line that only contains numbers.

^[0-9]+$

Let's walk through this piece-by-piece.

- Signifies the start of a line.

- Matches any digit between 0 and 9
- Matches one or more instance of the preceding expression.
- Signifies the end of the line.

We could re-write this regex in pseudo-English as

Pretty simple right?

We could replace with , which will do the same thing (match any
digit).

The great thing about this expression (and regular expressions in general) is that it
can be used, without much modification, in any programing language.

To demonstrate we'll now quickly go through how to perform this simple regex
search on a text file using 16 of the most popular programming languages.

2 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

We can use the following input file ( ) as an example.

1234
abcde
12db2
5362

Each script will read the file, search it using our regular expression, and
print the result ( ) to the console.

0.0 - Javascript / Node.js / Typescript

const fs = require('fs')
const testFile = fs.readFileSync('test.txt', 'utf8')
const regex = /^([0-9]+)$/gm
let results = testFile.match(regex)
console.log(results)

0.1 - Python

import re

with open('test.txt', 'r') as f:

test_string = f.read()
regex = re.compile(r'^([0-9]+)$', re.MULTILINE)
result = regex.findall(test_string)
print(result)

0.2 - R

fileLines <- readLines("test.txt")

results <- grep("^[0-9]+$", fileLines, value = TRUE)
print (results)

3 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

0.3 - Ruby

File.open("test.txt", "rb") do |f|

test_str = f.read
re = /^[0-9]+$/m
test_str.scan(re) do |match|
puts match.to_s
end
end

0.4 - Haskell

import Text.Regex.PCRE

main = do
fileContents <- readFile "test.txt"
let stringResult = fileContents =~ "^[0-9]+$" :: AllTextMatches [] String
print (getAllTextMatches stringResult)

0.5 - Perl

open my $fh, '<', 'test.txt' or die "Unable to open file $!";

read $fh, my $file_content, -s $fh;
close $fh;
my $regex = qr/^([0-9]+)$/mp;
my @matches = $file_content =~ /$regex/g;
print join(',', @matches);

0.6 - PHP

<?php
$myfile = fopen("test.txt", "r") or die("Unable to open file.");
$test_str = fread($myfile,filesize("test.txt"));
fclose($myfile);
$re = '/^[0-9]+$/m';
preg_match_all($re, $test_str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
?>

4 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

0.7 - Go

package main

import (
"fmt"
"io/ioutil"
"regexp"
)

func main() {
testFile, err := ioutil.ReadFile("test.txt")
if err != nil { fmt.Print(err) }
testString := string(testFile)
var re = regexp.MustCompile(`(?m)^([0-9]+)$`)
var results = re.FindAllString(testString, -1)
fmt.Println(results)
}

0.8 - Java

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;

class FileRegexExample {
public static void main(String[] args) {
try {
String content = new String(Files.readAllBytes(Paths.get("test.txt")));
Pattern pattern = Pattern.compile("^[0-9]+$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher(content);
ArrayList<String> matchList = new ArrayList<String>();

while (matcher.find()) {
matchList.add(matcher.group());
}

System.out.println(matchList);
} catch (IOException e) {
e.printStackTrace();
}
}

5 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex
}

0.9 - Kotlin

import java.io.File
import kotlin.text.Regex
import kotlin.text.RegexOption

val file = File("test.txt")

val content:String = file.readText()
val regex = Regex("^[0-9]+$", RegexOption.MULTILINE)
val results = regex.findAll(content).map{ result -> result.value }.toList()
println(results)

0.10 - Scala

import scala.io.Source
import scala.util.matching.Regex

object FileRegexExample {
def main(args: Array[String]) {
val fileContents = Source.fromFile("test.txt").getLines.mkString("\n")
val pattern = "(?m)^[0-9]+$".r
val results = (pattern findAllIn fileContents).mkString(",")
println(results)
}
}

0.11 - Swift

import Cocoa
do {
let fileText = try String(contentsOfFile: "test.txt", encoding: String.Encoding
let regex = try! NSRegularExpression(pattern: "^[0-9]+$", options: [ .anchorsMatchLines
let results = regex.matches(in: fileText, options: [], range: NSRange(location
let matches = results.map { String(fileText[Range($0.range, in: fileText)!]) }
print(matches)
} catch {
print(error)
}

6 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

0.12 - Rust

extern crate regex;

use std::fs::File;
use std::io::prelude::*;
use regex::Regex;

fn main() {
let mut f = File::open("test.txt").expect("file not found");
let mut test_str = String::new();
f.read_to_string(&mut test_str).expect("something went wrong reading the file"

let regex = match Regex::new(r"(?m)^([0-9]+)$") {

Ok(r) => r,
Err(e) => {
println!("Could not compile regex: {}", e);
return;
}
};

let result = regex.find_iter(&test_str);

for mat in result {
println!("{}", &test_str[mat.start()..mat.end()]);
}
}

0.13 - C#

using System;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;
using System.Linq;

namespace RegexExample
{
class FileRegexExample
{
static void Main()
{
string text = File.ReadAllText(@"./test.txt", Encoding.UTF8);
Regex regex = new Regex("^[0-9]+$", RegexOptions.Multiline);
MatchCollection mc = regex.Matches(text);
var matches = mc.OfType<Match>().Select(m => m.Value).ToArray();

7 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex
Console.WriteLine(string.Join(" ", matches));
}
}
}

0.14 - C++

#include <string>
#include <fstream>
#include <iostream>
#include <sstream>
#include <regex>
using namespace std;

int main () {
ifstream t("test.txt");
stringstream buffer;
buffer << t.rdbuf();
string testString = buffer.str();

regex numberLineRegex("(^|\n)([0-9]+)($|\n)");
sregex_iterator it(testString.begin(), testString.end(), numberLineRegex);
sregex_iterator it_end;

while(it != it_end) {
cout << it -> str();
++it;
}
}

0.15 - Bash

#!bin/bash
grep -E '^[0-9]+$' test.txt

Writing out the same operation in sixteen languages is a fun exercise, but we'll be
mostly sticking with Javascript and Python (along with a bit of Bash at the end) for
the rest of the tutorial since these languages (in my opinion) tend to yield the
clearest, most readable implementations.

8 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

Let's go through another simple example - matching any valid year in the 20th or
21st centuries.

\b(19|20)\d{2}\b

We're starting and ending this regex with instead of and . represents a
word boundary, or a space between two words. This will allow us to match years
within the text blocks (instead of on their own lines), which is very useful for search
through, say, paragraph text.

- Word boundary
- Matches either '19' or '20' using the OR ( ) operand.
- Two digits, same as
- Word boundary

Note that differs from , the code for a whitespace character.

searches for a place where a word character is not followed or preceded by
another word-character, so it is searching for the absence of a word character,
whereas is searching explicitly for a space character. is especially
appropriate for cases where we want to match a specific sequence/word, but
not the whitespace before or after it.

1.0 - Real-World Example - Count Year Occurrences

We can use this expression in a Python script to find how many times each year in
the 20th or 21st century is mentioned in a historical Wikipedia article.

import re
import urllib.request
import operator

# Download wiki page

url = "https://en.wikipedia.org/wiki/Diplomatic_history_of_World_War_II"
html = urllib.request.urlopen(url).read()

# Find all mentioned years in the 20th or 21st century

regex = r"\b(?:19|20)\d{2}\b"
matches = re.findall(regex, str(html))

# Form a dict of the number of occurrences of each year

year_counts = dict((year, matches.count(year)) for year in set(matches))

# Print the dict sorted in descending order

9 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex
for year in sorted(year_counts, key=year_counts.get, reverse=True):
print(year, year_counts[year])

The above script will print each year, along the number of times it is mentioned.

1941 137
1943 80
1940 76
1945 73
1939 71
...

Now we'll define a regex expression to match any time in the 24-hour format
( , such as 16:59).

\b([01]?[0-9]|2[0-3]):([0-5]\d)\b

- Word boundary
- 0 or 1
- Signifies that the preceding pattern is optional.
- any number between 0 and 9
- operand
- 2, followed by any number between 0 and 3 (i.e. 20-23)
- Matches the character
- Any number between 0 and 5
- Any number between 0 and 9 (same as )
- Word boundary

2.0 - Capture Groups

You might have noticed something new in the above pattern - we're wrapping the
hour and minute capture segments in parenthesis . This allows us to define
each part of the pattern as a capture group.

Capture groups allow us individually extract, transform, and rearrange pieces of each
matched pattern.

10 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

2.1 - Real-World Example - Time Parsing

For example, in the above 24-hour pattern, we've defined two capture groups - one
for the hour and one for the minute.

We can extract these capture groups easily.

Here's how we could use Javascript to parse a 24-hour formatted time into hours and
minutes.

const regex = /\b([01]?[0-9]|2[0-3]):([0-5]\d)/

const str = `The current time is 16:24`
const result = regex.exec(str)
console.log(`The current hour is ${result[1]}`)
console.log(`The current minute is ${result[2]}`)

The zeroth capture group is always the entire matched expression.

The above script will produce the following output.

The current hour is 16

The current minute is 24

As an extra exercise, you could try modifying this script to convert 24-hour times to
12-hour (am/pm) times.

Now let's match a style date pattern.

\b(0?[1-9]|[12]\d|3[01])([\/\-])(0?[1-9]|1[012])\2(\d{4})

This one is a bit longer, but it should look pretty similar to what we've covered
already.

- Match any number between 1 and 31 (with an

optional preceding zero)
- Match the seperator or
- Match any number between 1 and 12
- Matches the second capture group (the seperator)
- Match any 4 digit number (0000 - 9999)

11 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

The only new concept here is that we're using to match the second capture
group, which is the divider ( or ). This enables us to avoid repeating our pattern
matching specification, and will also require that the dividers are consistent (if the
first divider is , then the second must be as well).

3.0 - Capture Group Substitution

Using capture groups, we can dynamically reorganize and transform our string input.

The standard way to refer to capture groups is to use the or symbol, along
with the index of the capture group (remember that the capture group element is the
full captured text).

3.1 - Real-World Example - Date Format Transformation

Let's imagine that we were tasked with converting a collection of documents from
using the international date format style ( ) to the American style
( )

We could use the above regular expression with a replacement pattern -

or .

Let's break our capture groups down.

$1 - First capture group: the day digits.

$2 - Second capture group: the divider.
$3 - Third capture group: the month digits.
$4 - Fourth capture group: the year digits.

Our replacement pattern ( ) will simply swap the month and day content
in the expression.

Here's how we could do this transformation in Javascript -

const regex = /\b(0?[1-9]|[12]\d|3[01])([ \/\-])(0?[1-9]|1[012])\2(\d{4})/

const str = `Today's date is 18/09/2017`
const subst = `$3$2$1$2$4`
const result = str.replace(regex, subst)
console.log(result)

The above script will print to the console.

Here's how the same script would look in Python -

12 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

import re
regex = r'\b(0?[1-9]|[12]\d|3[01])([ \/\-])(0?[1-9]|1[012])\2(\d{4})'
test_str = "Today's date is 18/09/2017"
subst = r'\3\2\1\2\4'
result = re.sub(regex, subst, test_str)
print(result)

Regular expressions can also be useful for input validation.

^[^@\s]+@[^@\s]+\.\w{2,6}$

Above is an (overly simple) regular expression to match an email address.

- Start of input
- Match any character except for and whitespace
- 1+ times
- Match the '@' symbol
- Match any character except for and whitespace), 1+ times
- Match the '.' character.
- Match any word character (letter, digit, or underscore), 2-6 times
- End of input

4.0 - Real-World Example - Validate Email

Let's say we wanted to create a simple Javascript function to check if an input is a

valid email.

function isValidEmail (input) {

const regex = /^[^@\s]+@[^@\s]+\.\w{2,6}$/g;
const result = regex.exec(input)

// If result is null, no match was found

return !!result
}

const tests = [
`[email protected]`, // Valid
'', // Invalid
`test.test`, // Invalid

13 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex
'@[email protected]', // Invalid
'invalid@@test.com', // Invalid
`gmail.com`, // Invalid
`this is a [email protected]`, // Invalid
`[email protected]@gmail.com` // Invalid
]

console.log(tests.map(isValidEmail))

The output of this script should be

false, false ] .

Note - In a real-world application, validating an email address using a regular

expression is not enough for many situations, such as when a user signs up.
Once you have confirmed that the input text is an email address, you should
always follow through with the standard practice of sending a
confirmation/activation email.

4.1 - Full Email Regex

This is a very simple example which ignores lots of very important email-validity edge
cases, such as invalid start/end characters and consecutive periods. I really don't
recommend using the above expression in your applications; it would be best to
instead use a reputable email-validation library or to track down a more complete
email validation regex.

For instance, here's a more advanced expression from (the aptly named)
emailregex.com which matches 99% of RFC 5322 compliant email addresses.

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"
(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b
\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-
z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b
\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Yeah, we're not going to walk through that one.

One of the most useful ad-hoc uses of regular expressions can be code refactors.
Most code editors support regex-based find/replace operations. A well-formed regex
substitution can turn a tedious 30-minute busywork job into a beautiful single-
expression piece of regex refactor wizardry.

14 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

Instead of writing scripts to perform these operations, try doing them natively in your
text editor of choice. Nearly every text editor supports regex based find-and-replace.

Here are a few guides for popular editors.

Regex Substitution in Sublime - http://docs.sublimetext.info/en/latest

/search_and_replace/search_and_replace_overview.html#using-regular-expressions-
in-sublime-text

Regex Substitution in Vim - http://vimregex.com/#backreferences

Regex Substitution in VSCode - https://code.visualstudio.com/docs/editor

/codebasics#_advanced-search-options

Regex Substitution in Emacs - https://www.gnu.org/software/emacs/manual

/html_node/emacs/Regexp-Replace.html

5.0 - Extracting Single Line CSS Comments

What if we wanted to find all of the single-line comments within a CSS file?

CSS comments come in the form

To capture any single-line CSS comment, we can use the following expression.

(\/\*+)(.*)(\*+\/)

- Match symbol (we have escape the character)

- Match one or more symbols (again, we have to escape the
character with ).
- Match any character (besides a newline ), any number of times
- Match one or more characters
- Match closing symbol.

Note that we have defined three capture groups in the above expression: the
opening characters ( ), the comment contents ( ), and the closing
characters ( ).

5.1 - Real-World Example - Convert Single-Line Comments to Multi-Line

Comments

We could use this expression to turn each single-line comment into a multi-line
comment by performing the following substitution.

15 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

$1\n$2\n$3

Here, we are simply adding a newline between each capture group.

Try performing this substitution on a file with the following contents.

/* Single Line Comment */

body {
background-color: pink;
}

/*
Multiline Comment
*/
h1 {
font-size: 2rem;
}

/* Another Single Line Comment */

h2 {
font-size: 1rem;
}

The substitution will yield the same file, but with each single-line comment converted
to a multi-line comment.

/*
Single Line Comment
*/
body {
background-color: pink;
}

/*
Multiline Comment
*/
h1 {
font-size: 2rem;
}

/*
Another Single Line Comment
*/
h2 {
font-size: 1rem;
}

16 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

5.2 - Real-World Example - Standardize CSS Comment Openings

Let's say we have a big messy CSS file that was written by a few different people. In
this file, some of the comments start with , some with , and some with
.

Let's write a regex substitution to standardize all of the single-line CSS comments to
start with .

In order to do this, we'll extend our expression to only match comments with two or
more starting asterisks.

(\/\*{2,})(.*)(\*+\/)

This expression very similar to the original. The main difference is that at the
beginning we've replaced with . The syntax signifies "two or
more" instances of .

To standardize the opening of each comment we can pass the following substitution.

/*$2$3

Let's run this substitution on the following test CSS file.

/** Double Asterisk Comment */

body {
background-color: pink;
}

/* Single Asterisk Comment */

h1 {
font-size: 2rem;
}

/***** Many Asterisk Comment */

h2 {
font-size: 1rem;
}

The result will be the same file with standardized comment openings.

17 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

/* Double Asterisk Comment */

body {
background-color: pink;
}

/* Single Asterisk Comment */

h1 {
font-size: 2rem;
}

/* Many Asterisk Comment */

h2 {
font-size: 1rem;
}

Another highly useful regex recipe is matching URLs in text.

Here an example URL matching expression from Stack Overflow.

(https?:\/\/)(www\.)?(?<domain>[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6})
(?<path>\/[-a-zA-Z0-9@:%_\/+.~#?&=]*)?

- Match http(s)
- Optional "www" prefix
- Match a valid domain name
- Match a domain extension extension (i.e. ".com" or ".org")
- Match URL path ( ), query
string ( ), and/or file extension ( ), all optional.

6.0 - Named capture groups

You'll notice here that some of the capture groups now begin with a
identifier. This is the syntax for a named capture group, which makes the data
extraction cleaner.

6.1 - Real-World Example - Parse Domain Names From URLs on A Web Page

Here's how we could use named capture groups to extract the domain name of each
URL in a web page using Python.

18 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

import re
import urllib.request

html = str(urllib.request.urlopen("https://moz.com/top500").read())
regex = r"(https?:\/\/)(www\.)?(?P<domain>[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6})(?P<pat
matches = re.finditer(regex, html)

for match in matches:

print(match.group('domain'))

The script will print out each domain name it finds in the raw web page HTML
content.

...
facebook.com
twitter.com
google.com
youtube.com
linkedin.com
wordpress.org
instagram.com
pinterest.com
wikipedia.org
wordpress.com
...

Regular expressions are also supported by many Unix command line utilities! We'll
walk through how to use them with to find specific files, and with to
replace text file content in-place.

7.0 - Real-World Example - Image File Matching With

We'll define another basic regular expression, this time to match image files.

^.+\.(?i)(png|jpg|jpeg|gif|webp)$

- Start of line.
- Match any character (letters, digits, symbols), expect for (new line), 1+
times.

19 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

- Match the '.' character.

- Signifies that the next sequence is case-insensitive.
- Match common image file extensions
- End of line

Here's how you could list all of the image files in your directory.

ls ~/Downloads | grep -E '^.+\.(?i)(png|jpg|jpeg|gif|webp)$'

- List the files in your downloads directory

- Pipe the output to the next command
- Filter the input with regular expression

7.1 - Real-World Example - Email Substitution With

Another good use of regular expressions in bash commands could be redacting

emails within a text file.

This can be done quite using the command, along with a modified version of
our email regex from earlier.

sed -E -i 's/^(.*?\s|)[^@]+@[^\s]+/\1\{redacted\}/g' test.txt

- The Unix "stream editor" utility, which allows for powerful text file
transformations.
- Use extended regex pattern matching
- Replace the file stream in-place
- Wrap the beginning of the line in a capture group
- Simplified version of our email regex.
- Replace each email address with .
- Perform the operation on the file.

We can run the above substitution command on a sample file.

My email is [email protected]

Once the command has been run, the email will be redacted from the file.

20 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

My email is {redacted}

Warning - This command will automatically remove all email addresses from
any that you pass it, so be careful where/when you run it, since this
operation cannot be reversed. To preview the results within the terminal,
instead of replacing the text in-place, simply omit the flag.

Note - While the above command should work on most Linux distributions,
macOS uses the BSD implementation is , which is more limited in its
supported regex syntax. To use on macOS with decent regex support, I
would recommend installing the GNU implementation of with
, and then using from the command line instead of .

Ok, so clearly regex is a powerful, flexible tool. Are there times when you should
avoid writing your own regex expressions? Yes!

8.0 - Language Parsing

Parsing structured languages, from English to Java to JSON, can be a real pain using
regex expressions.

Writing your own regex expression for this purpose is likely to be an exercise in
frustration that will result in eventual (or immediate) disaster when an edge case or
minor syntax/grammar error in the data source causes the expression to fail.

Battle-hardened parsers are available for virtually all machine-readable languages,

and NLP tools are available for human languages - I strongly recommend that you
use one of them instead of attempting to write your own.

8.1 - Security-Critical Input Filtering and Blacklists

It may seem tempting to use regular expressions to filter user input (such as from a
web form), to prevent hackers from sending malicious commands (such as SQL
injections) to your application.

Using a custom regex expression here is unwise since it is very difficult to cover every
potential attack vector or malicious command. For instance, hackers can use
alternative character encodings to get around naively programmed input blacklist
filters.

21 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

This is another instance where I would strongly recommend using the well-tested
libraries and/or services, along with the use of whitelists instead of blacklists, in order
to protect your application from malicious inputs.

8.2 - Performance Intensive Applications

Regex matching speeds can range from not-very-fast to extremely slow, depending
on how well the expression is written. This is fine for most use cases, especially if the
text being matched is very short (such as an email address form). For high-
performance server applications, however, regex can be a performance bottleneck,
especially if expression is poorly written or the text being searched is long.

8.3 - For Problems That Don't Require Regex

Regex is an incredibly useful tool, but that doesn't mean you should use it
everywhere.

If there is an alternative solution to a problem, which is simpler and/or does not

require the use of regular expressions, please do not use regex just to feel clever.
Regex is great, but it is also one of the least readable programming tools, and one
that is very prone to edge cases and bugs.

Overusing regex is a great way to make your co-workers (and anyone else who needs
to work with your code) very angry with you.

I hope that this has been a useful introduction to the many uses of regular
expressions.

There still are lots of regex use cases that we have not covered. For instance, regex
can be used in PostgreSQL queries to dynamically search for text patterns within a
database.

We have also left lots of powerful regex syntax features uncovered, such as
lookahead, lookbehind, atomic groups, recursion, and subroutines.

To improve your regex skills and to learn more about these features, I would
recommend the following resources.

Learn Regex The Easy Way - https://github.com/zeeshanu/learn-regex

Regex101 - https://regex101.com/
Releases HackerRank Regex Course - https://www.hackerrank.com/domains/regex/re-
introduction
No releases published
Regex visualizer - https://extendsclass.com/regex-tester.html
22 de 23 25/02/2021 19:54
GitHub - cyrilbois/You-Should-Learn-Regex: Regular Expresi... https://github.com/cyrilbois/You-Should-Learn-Regex

The source code for the examples in this tutorial can be found at the Github
Packages
repository here - https://github.com/triestpa/You-Should-Learn-Regex

No packages published
Feel free to comment below with any suggestions, ideas, or criticisms regarding this
tutorial.

Languages

JavaScript 24.9% Python 15.6% Java 8.5% Rust 7.0% C# 6.8% C++ 5.6%
Other 31.6%

23 de 23 25/02/2021 19:54

Programming Paradigms PDF
100% (1)
Programming Paradigms PDF
10 pages
Learning REGEX
No ratings yet
Learning REGEX
94 pages
Regular Expressions
100% (5)
Regular Expressions
94 pages
English PDF
No ratings yet
English PDF
560 pages
Regex Cheat Sheet
No ratings yet
Regex Cheat Sheet
10 pages
Learn Regex The Hard Way
0% (1)
Learn Regex The Hard Way
5 pages
WinPLC7 V4 User Manual
100% (1)
WinPLC7 V4 User Manual
186 pages
Python Regular Expression
100% (1)
Python Regular Expression
31 pages
Regular Expressions Basics
No ratings yet
Regular Expressions Basics
11 pages
JavaScript Regular Expressions - Sample Chapter
No ratings yet
JavaScript Regular Expressions - Sample Chapter
22 pages
Sap Pra PDF
No ratings yet
Sap Pra PDF
20 pages
WT - Regular Expression
No ratings yet
WT - Regular Expression
22 pages
Oow Getting Regular With Regular Expressions
100% (1)
Oow Getting Regular With Regular Expressions
62 pages
An Introduction To Regular Expressions (9781492082569)
100% (1)
An Introduction To Regular Expressions (9781492082569)
17 pages
Abap Dynamic Table
No ratings yet
Abap Dynamic Table
8 pages
Sundeep Agarwal Understanding Python Re Gex
No ratings yet
Sundeep Agarwal Understanding Python Re Gex
228 pages
Mastering Modal Verbs
No ratings yet
Mastering Modal Verbs
255 pages
Regular Expression Tutorial: What Regular Expressions Are Exactly - Terminology
No ratings yet
Regular Expression Tutorial: What Regular Expressions Are Exactly - Terminology
42 pages
Regex Slides PDF
No ratings yet
Regex Slides PDF
435 pages
Huawei FusionSphere 6.1 Virtualization Suite Data Sheet
No ratings yet
Huawei FusionSphere 6.1 Virtualization Suite Data Sheet
11 pages
Regular Expression
No ratings yet
Regular Expression
15 pages
COMP3 RegEx
No ratings yet
COMP3 RegEx
10 pages
Java Regular Expression Final
No ratings yet
Java Regular Expression Final
68 pages
A Practical Gui Regular Expressions - Learn RegEx With Real Life Examples
No ratings yet
A Practical Gui Regular Expressions - Learn RegEx With Real Life Examples
38 pages
RegularExpressions
No ratings yet
RegularExpressions
16 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Regular Expressions: Luísa Coheur
No ratings yet
Regular Expressions: Luísa Coheur
22 pages
03 Regular Expressions and Grammars Parser Generators 16102023 041542pm
No ratings yet
03 Regular Expressions and Grammars Parser Generators 16102023 041542pm
32 pages
Regular Expression
No ratings yet
Regular Expression
18 pages
Regular Expression
No ratings yet
Regular Expression
13 pages
Regex Tutorial - A Quick Cheatsheet by Examples - by Jonny Fox - Factory Mind - Medium
No ratings yet
Regex Tutorial - A Quick Cheatsheet by Examples - by Jonny Fox - Factory Mind - Medium
7 pages
TC2543en-Ed02 Generic Appliance ServerInstallation
No ratings yet
TC2543en-Ed02 Generic Appliance ServerInstallation
26 pages
Module 4 - Regular Expressions1
No ratings yet
Module 4 - Regular Expressions1
37 pages
2 Regular Expression
No ratings yet
2 Regular Expression
23 pages
Lecture 9
No ratings yet
Lecture 9
26 pages
Lec 06 - Regular Expression
No ratings yet
Lec 06 - Regular Expression
19 pages
Class 3
No ratings yet
Class 3
52 pages
Network Security - 4.2 Reg Ex Primer
No ratings yet
Network Security - 4.2 Reg Ex Primer
3 pages
Lecture 6 Re Basics
No ratings yet
Lecture 6 Re Basics
12 pages
Python RegEx
No ratings yet
Python RegEx
8 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Py Regex
No ratings yet
Py Regex
50 pages
Regex
No ratings yet
Regex
24 pages
How To Write Regular Expressions?: What Is A Regular Expression and What Makes It So Important?
No ratings yet
How To Write Regular Expressions?: What Is A Regular Expression and What Makes It So Important?
2 pages
14.regular Expression
No ratings yet
14.regular Expression
3 pages
(CSC221 2024-02-08) Regular Expressions
No ratings yet
(CSC221 2024-02-08) Regular Expressions
21 pages
Amazon Web Services
No ratings yet
Amazon Web Services
85 pages
Regex Tutorial-A Quick Cheatsheet by Examples: Anchors - and $
No ratings yet
Regex Tutorial-A Quick Cheatsheet by Examples: Anchors - and $
7 pages
Sys LW-08EN Regex-Filters
No ratings yet
Sys LW-08EN Regex-Filters
31 pages
Regular Expression Syntax
No ratings yet
Regular Expression Syntax
9 pages
Chapter 10
No ratings yet
Chapter 10
28 pages
Solution-Assignment 1
No ratings yet
Solution-Assignment 1
5 pages
Pattern Matching With Regular Expressions - by Zohaib Shahzad - The Startup - Medium
No ratings yet
Pattern Matching With Regular Expressions - by Zohaib Shahzad - The Startup - Medium
8 pages
Regular Expressions - Pattern Matching
No ratings yet
Regular Expressions - Pattern Matching
107 pages
Regular Expressions in Python
No ratings yet
Regular Expressions in Python
12 pages
Regular Expressions
No ratings yet
Regular Expressions
9 pages
Using Regular Expressions With PHP
No ratings yet
Using Regular Expressions With PHP
6 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Howto Regex PDF
No ratings yet
Howto Regex PDF
20 pages
L02 - Programming - RE PLC
No ratings yet
L02 - Programming - RE PLC
35 pages
Module5 RegularExpressions
No ratings yet
Module5 RegularExpressions
10 pages
ACOS 4.1.4 Web Application Firewall Guide: For A10 Thunder™ Series and AX™ Series 21 February 2018
No ratings yet
ACOS 4.1.4 Web Application Firewall Guide: For A10 Thunder™ Series and AX™ Series 21 February 2018
182 pages
Regular Expressions
No ratings yet
Regular Expressions
5 pages
FAXCOM Client User's Guide
No ratings yet
FAXCOM Client User's Guide
131 pages
Cq5 Querybuilder: .Adaptto (Berlin)
No ratings yet
Cq5 Querybuilder: .Adaptto (Berlin)
25 pages
Regex
100% (1)
Regex
42 pages
Attendance Software Project: 1. Components of Existing System
No ratings yet
Attendance Software Project: 1. Components of Existing System
4 pages
Introduction To Cube
No ratings yet
Introduction To Cube
32 pages
Apuntes Laravel (Chuleta)
No ratings yet
Apuntes Laravel (Chuleta)
31 pages
Regular Expressions
No ratings yet
Regular Expressions
4 pages
Venomseo User Manual V1.0.6
No ratings yet
Venomseo User Manual V1.0.6
41 pages
LibreOffice - Keyboard Shortcuts
No ratings yet
LibreOffice - Keyboard Shortcuts
11 pages
Part 1 (2-3 Minutes) : Preliminary English Test Speaking Test
No ratings yet
Part 1 (2-3 Minutes) : Preliminary English Test Speaking Test
7 pages
SmartLCT User Manual-V3.4
No ratings yet
SmartLCT User Manual-V3.4
52 pages
Slackzine 1 A 16-5
No ratings yet
Slackzine 1 A 16-5
240 pages
Linux Fundamentals 3
No ratings yet
Linux Fundamentals 3
1 page
Firebird 1.5 Error Codes: From MSG - Gbak, Release Sources
No ratings yet
Firebird 1.5 Error Codes: From MSG - Gbak, Release Sources
26 pages
Pizza Store: Codenation - Campus Pool - 15th October 01h: 19m
No ratings yet
Pizza Store: Codenation - Campus Pool - 15th October 01h: 19m
5 pages
AVE 19.3 Installation and Upgrade Guide
No ratings yet
AVE 19.3 Installation and Upgrade Guide
122 pages
Cme SRST Cue Licensing Faq v2
No ratings yet
Cme SRST Cue Licensing Faq v2
3 pages
ECAD-MCAD Collaboration Extension: PTC Creo
No ratings yet
ECAD-MCAD Collaboration Extension: PTC Creo
3 pages
Angularjs: A Complete Client-Side Solution
No ratings yet
Angularjs: A Complete Client-Side Solution
4 pages
SQL Call Level Interface
No ratings yet
SQL Call Level Interface
336 pages
Ch18 Service-Oriented Software Engineering
No ratings yet
Ch18 Service-Oriented Software Engineering
69 pages
Api For Odfpy
No ratings yet
Api For Odfpy
89 pages
Assignment 2016 17 Dca01 06 Practical
No ratings yet
Assignment 2016 17 Dca01 06 Practical
6 pages
Smart Parking System: Asia Pacific Bca College
No ratings yet
Smart Parking System: Asia Pacific Bca College
17 pages
How To Customize Linux Terminal With OH MY ZSH (2023)
No ratings yet
How To Customize Linux Terminal With OH MY ZSH (2023)
1 page
Sample Questions: SAS Platform Administration For SAS 9
No ratings yet
Sample Questions: SAS Platform Administration For SAS 9
3 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Python Reference: An Alphabetical Guide
From Everand
Python Reference: An Alphabetical Guide
Jo Foster
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet