Regex
Regex
Regex
1 Strings
2 Regular Expressions
2.1 Anchors
2.6 Backreferences
3 Exercise
Regex
1 Strings
1.1 Manipulate strings
# Python
import os
wd_t = r'C:\Users\mypath\to\myfolder'
wd_t
## 'C:\\Users\\mypath\\to\\myfolder'
print(wd_t)
## C:\Users\mypath\to\myfolder
# Python
# mutable
## ['r', 'k', 'i', 'n', 'g', ' ', 'd',
'i', 'r', 'e']
slc_infos[7] = '0'
slc_infos
# count
## ['r', 'k', 'i', 'n', 'g', ' ', 'd',
'0', 'r', 'e']
wd_info.count('s')
import numpy as np
## 'a : 1 b : 2 c : 3'
[''.join([letter,number]) for letter,
number in zip(['a :','b :','c :'],['1'
,'2','3'])]
## ['a :1', 'b :2', 'c :3']
list(map(''.join,zip(['a :','b :','c :
'],['1','2','3'])))
## ['a :1', 'b :2', 'c :3']
'.'.join((np.char.array(['a :','b :','
c :'])+np.char.array(['1','2','3'])).t
olist())
# R
wd_t = 'C:\\Users\\mypath\\to\\myfolde
r'
print(wd_t)
## [1] "C:\\Users\\mypath\\to\\myfolde
r"
cat(wd_t)
## C:\Users\mypath\to\myfolder
# R
library(stringr)
library(tools)
nchar(slc_infos)
## [1] 11
slc_infos
## [1] "orking dire"
slc_infos <- strsplit(slc_infos,split
= '')[[1]]
slc_infos
## [1] "o" "r" "k" "i" "n" "g" " " "d
" "i" "r" "e"
# mutable
slc_infos[7] = '0'
# count
length(which(strsplit(wd_info,split =
'')[[1]]=='S'))
## [1] 0
time_ = time.ctime(time.time())
time_info = 'Time: '
wd = os.getcwd()
wd_info = '\nCurrent working directory
: '
infos = time_info+time_+wd_info+wd
print(infos)
## Time: Wed Feb 8 10:39:10 2023
## Current working directory: /home/pe
ltouz/Documents/GitHub/M1-Programming
# R
library(benchmarkme)
time_ <- Sys.time()
time_info <- 'Time: '
wd = getwd()
wd_info = '\nCurrent working directory
: '
cat(infos)
## Time: 2023-02-08 10:39:10
## Current working directory: /home/pe
ltouz/Documents/GitHub/M1-Programming
Escape
Sequence Description
# Python
def get_infos():
time_info = 'Time: '+time.ctime(time
.time())
wd_info = '\nCurrent working directo
ry: '+os.getcwd()
infos = time_info+wd_info
print(infos)
def get_infos_2():
infos = '''Time: %s
Current working directory: %s'''% (t
ime.ctime(time.time()),
os.getcwd())
print(infos)
def get_infos_3():
infos = '''Time: {t}
Current working directory: {wd}'''.f
ormat(t = time.ctime(time.time()),
wd = os.getcwd())
print(infos)
get_infos()
## Time: Wed Feb 8 10:39:11 2023
## Current working directory: /home/pe
ltouz/Documents/GitHub/M1-Programming
get_infos_2()
## Time: Wed Feb 8 10:39:11 2023
## Current working directory: /home/
peltouz/Documents/GitHub/M1-Programmin
g
get_infos_3()
## Time: Wed Feb 8 10:39:11 2023
## Current working directory: /home/
peltouz/Documents/GitHub/M1-Programmin
g
# R
get_infos_2<- function(){
infos = sprintf('Time: %s
Current working directory: %s',
Sys.time(),
getwd())
writeLines(infos) # other way of tak
e into account escape character
}
get_infos()
## Time: 2023-02-08 10:39:11
## Current working directory: /home/pe
ltouz/Documents/GitHub/M1-Programming
get_infos_2()
## Time: 2023-02-08 10:39:11
## Current working directory: /home/
peltouz/Documents/GitHub/M1-Programmin
g
2 Regular Expressions
Text used in this section is the abstract of
‘ImageNet Classification with Deep
Convolutional Neural Networks’ (Alex
Krizhevsky, Ilya Sutskever, and Geoffrey E.
Hinton 2012) presented at ‘Advances in
Neural Information Processing Systems 25’
(NIPS 2012)
Re Stringr Description
| Either or “x|y”
# Python
import re
abstract = open("data/imagenet_abstrac
t.txt", "r").read()
re.search('layers',abstract)
## <_sre.SRE_Match object; span=(437,
443), match='layers'>
re.findall('layers',abstract)
## ['layers', 'layers', 'layers', 'lay
ers']
re.findall('net+','net ne nett network
s')
## ['net', 'nett', 'net']
re.findall('net*','net ne nett network
s')
# R
library(stringr)
library(readr)
str_locate_all(pattern = 'layers',abst
ract)
## [[1]]
## start end
## [1,] 438 443
## [2,] 488 493
## [3,] 523 528
## [4,] 728 733
str_locate_all(pattern = 'net+','net n
e nett networks')
## [[1]]
## start end
## [1,] 1 3
## [2,] 8 11
## [3,] 13 15
str_locate_all(pattern = 'net*','net n
e nett networks')
## [[1]]
## start end
## [1,] 1 3
## [2,] 5 6
## [3,] 8 11
## [4,] 13 15
str_replace_all(abstract,'\\\\','')
## [1] "<p>We trained a large, deep co
nvolutional neural network to classify
the 1.3 million high-resolution images
in the LSVRC-2010 ImageNet training se
t into the 1000 different classes. On
the test data, we achieved top-1 and t
op-5 error rates of 39.7% and 18.9% wh
ich is considerably better than the pr
evious state-of-the-art results. The n
eural network, which has 60 million pa
rameters and 500,000 neurons, consists
of five convolutional layers, some of
which are followed by max-pooling laye
rs, and two globally connected layers
with a final 1000-way softmax. To make
training faster, we used non-saturatin
g neurons and a very efficient GPU imp
lementation of convolutional nets. To
reduce overfitting in the globally con
nected layers we employed a new regula
rization method that proved to be very
effective.</p>"
2.1 Anchors
This needs to specify match position.
Example Description
# Python
import re
words = ['neuron','neural','network','
networks','deep convolutional neural n
etwork']
# R
library(stringr)
words = c('neuron','neural','network',
'networks','deep convolutional neural
network')
str_detect(words,'\\Aneu')
## [1] TRUE TRUE FALSE FALSE FALSE
str_detect(words,'^neu')
## [1] TRUE TRUE FALSE FALSE FALSE
str_detect(words,'neu\\Z')
## [1] FALSE FALSE FALSE FALSE FALSE
str_detect(words,'neu$')
## [1] FALSE FALSE FALSE FALSE FALSE