Lesson 04 Text Files
Created @December 8, 2021 6:38 AM
Class
Type
Materials
Reviewed
Last Update @December 8, 2021 10:23 PM
4.1 Text tools
4.2 grep (Generic Regular Expression Parser)
4.3 Regular Expressions
POSIX:
4.4 awk
4.5 sed (Stream Editor)
4.1 Text tools
more // read file contents
less // more advance features tham "more" // can browe forward (space bar) and back
ward (Page Up)
head // show the first 10 lines
tail // show the last 10 lines
-n nn // to specify exact number of lines
cat
-A : shows all non-printable characters (tab, end of line, ...)
-b : line numbers
-s : supress repeated embty lines
tac // same as cat, but in reverse order, funny command
cut // filter output
sort // sort output
tr // translate // works like find & replace
head -n 5 /etc/passwd
head -n 10 /etc/passwd | tail -n 1 // show line number 10
tail -n 3 /etc/passwd
tail -f /var/log/messages
$ head -n 5 /etc/passwd | tail -n 1 // show line number 5 from the file /e
tc/passwd
Lesson 04 Text Files 1
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
cut -f 3 -d : /etc/passwd | less // cut field number 3, where delimiter is ":"
cut -f 3 -d : /etc/passwd | sort | less
cut -f 3 -d : /etc/passwd | sort -n | less // sort as numbers
cut -f 1 -d : /etc/passwd | sort | tr [a-z] [A-Z] // all converted to UPPER CASE
cut -f 1 -d : /etc/passwd | sort | tr [:lower:] [:upper:] // all converted to UPPER
CASE // works with Special Characters // better multi langage support
4.2 grep (Generic Regular Expression
Parser)
find text in a file or in an output
ps -aux | grep ssh
grep linda * 2> /dev/null // search for linda, in all files, in the current dire
ctory
// it will show file names & the line containing "lind
a"
grep '\<root\>' * 2> /dev/null // search for "root", in all files, in the current d
irectory
grep -l linda * 2> /dev/null // l : less, show list of files only
grep -i linda * // -i : ignor case
grep -A5 linda /etc/passwd // print the following 5 lines after finding linda //
useful in logs
grep -B5 linda /etc/passwd // print the previous 5 lines before finding linda //
useful in logs
grep -R root /etc // Recursively find the word root
grep -Rl root /etc 2> /dev/null | less // l : less
egrep '^[[:alpha:]]{3}$' * 2> /dev/null // egrep all lines that are exactly 3
alphabets
grep '^...$' * 2> /dev/null // grep all lines that are exactly 3 c
haracters
$ grep '^endif$' * 2> /dev/null // find exactlty "endif"
grep '\<endif\>' * 2> /dev/null // find exactlty "endif"
4.3 Regular Expressions
globbing : applies to file name
Regular Expression : applies to search patterns for a text inside
a file
Lesson 04 Text Files 2
grep 'a*' a* // first 'a*' is Regular expression, to search for the pattern 'a*'
inside the file
// second a* is globbing, to search for files with a*
Regular expressions are used with:
grep
vim
awk
sed
POSIX:
The Portable Operating System Interface is a family of standards specified by the
IEEE Computer Society for maintaining compatibility between operating systems.
The goal of POSIX is to ease the task of cross-platform software development by
establishing a set of guidelines for operating system vendors to follow. Ideally, a
developer should have to write a program only once to run on all POSIX-compliant
systems.
man 7 regex // Regular Expression
$ cat regtext
b
bt
bit
bite
boot
bloat
boat
Lesson 04 Text Files 3
Regular expression must be
between single quotes ' ',
'b.*t'
The period . matches any single character.
Anchoring
The caret ^ and the dollar sign $ are meta-characters that respectively match the
empty string at the beginning and end of a
line.
The Backslash Character and Special Expressions
The symbols \< and \> respectively match the empty string at the beginning and end
of a word.
The symbol \b matches the empty string at the edge of a word, and \B
matches the empty string provided it's not at the edge of a word.
The symbol \w is a synonym for [[:alnum:]] and \W is a synonym for [^[:alnum:]].
Repetition
A regular expression may be followed by one of several repetition operators:
? The preceding item is optional and matched at most once.
* The preceding item will be matched zero or more times.
+ The preceding item will be matched one or more times.
{n} The preceding item is matched exactly n times.
{n,} The preceding item is matched n or more times.
{,m} The preceding item is matched at most m times. This is a GNU extension.
{n,m} The preceding item is matched at least n times, but not more than m times.
* is a repetition operator for
zero or more
Lesson 04 Text Files 4
? is an Extended Regular
Expression. ? did not work * is a repetition operator for
with grep, it works with egrep. zero or more
* is a repetition operator for
zero or more. boat does not
match, because * means that
"o" (the preceding character)
is repeated zero or more
times.
4.4 awk
awk is specialized in data extraction and reporting (could be sent to a printer).
$ awk -F : '/linda/ { print $4 }' /etc/passwd // -F : the delimiter, $4 is the
field number 4
1001
awk -F : '{ print $NF }' /etc/passwd // $NF number of fields, print the last fie
ld in the line.
// useful when number of fields are not the sam
e in all lines.
/bin/bash
/sbin/nologin
/sbin/nologin
/sbin/nologin
/sbin/nologin
/bin/sync
/sbin/shutdown
/sbin/halt
/sbin/nologin
// print the last column of ps -aux
$ ps -aux | awk '{ print $NF }'
$ ls -l /etc | awk '/pass/ { print }' | less
-rw-r--r--. 1 root root 2598 Dec 6 16:04 passwd
-rw-r--r--. 1 root root 2557 Dec 4 23:41 passwd-
(END)
Lesson 04 Text Files 5
$ ls -l /etc | grep pass
-rw-r--r--. 1 root root 2598 Dec 6 16:04 passwd
-rw-r--r--. 1 root root 2557 Dec 4 23:41 passwd-
4.5 sed (Stream Editor)
$ cat sedfile
one
two
three
four
five
$ sed -n 4p sedfile // -n 4p print line number 4
four
$ sed -i s/four/FOUR/g sedfile // -i write directly to the file, // s substi
tute and replace
// without -i it will write to the stdout
$ cat sedfile
one
two
three
FOUR
five
$
$ sed -n 4p sedfile
FOUR
$ sed -i -e '2d' sedfile // -i modify the file, 2d delete line number 2
$ cat sedfile
one
three
FOUR
five
Lesson 04 Text Files 6