0% found this document useful (0 votes)
56 views

Lesson 04 Text Files

This document provides an overview of common Linux text processing tools: - Section 4.1 describes text file viewing tools like head, tail, cat - Section 4.2 covers the grep tool for searching text files - Section 4.3 defines regular expressions used with grep and other tools - Section 4.4 presents awk for data extraction and reporting - Section 5.5 introduces sed for editing text files The document gives examples of using each tool to view, search, extract, and edit parts of text files.

Uploaded by

Taha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Lesson 04 Text Files

This document provides an overview of common Linux text processing tools: - Section 4.1 describes text file viewing tools like head, tail, cat - Section 4.2 covers the grep tool for searching text files - Section 4.3 defines regular expressions used with grep and other tools - Section 4.4 presents awk for data extraction and reporting - Section 5.5 introduces sed for editing text files The document gives examples of using each tool to view, search, extract, and edit parts of text files.

Uploaded by

Taha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Lesson 04 Text Files

Created @December 8, 2021 6:38 AM

Class

Type

Materials

Reviewed

Last Update @December 8, 2021 10:23 PM


4.1 Text tools
4.2 grep (Generic Regular Expression Parser)
4.3 Regular Expressions
POSIX:
4.4 awk
4.5 sed (Stream Editor)

4.1 Text tools


more // read file contents
less // more advance features tham "more" // can browe forward (space bar) and back
ward (Page Up)
head // show the first 10 lines
tail // show the last 10 lines
-n nn // to specify exact number of lines
cat
-A : shows all non-printable characters (tab, end of line, ...)
-b : line numbers
-s : supress repeated embty lines
tac // same as cat, but in reverse order, funny command
cut // filter output
sort // sort output
tr // translate // works like find & replace

head -n 5 /etc/passwd
head -n 10 /etc/passwd | tail -n 1 // show line number 10
tail -n 3 /etc/passwd
tail -f /var/log/messages

$ head -n 5 /etc/passwd | tail -n 1 // show line number 5 from the file /e


tc/passwd

Lesson 04 Text Files 1


lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

cut -f 3 -d : /etc/passwd | less // cut field number 3, where delimiter is ":"


cut -f 3 -d : /etc/passwd | sort | less
cut -f 3 -d : /etc/passwd | sort -n | less // sort as numbers
cut -f 1 -d : /etc/passwd | sort | tr [a-z] [A-Z] // all converted to UPPER CASE
cut -f 1 -d : /etc/passwd | sort | tr [:lower:] [:upper:] // all converted to UPPER
CASE // works with Special Characters // better multi langage support

4.2 grep (Generic Regular Expression


Parser)
find text in a file or in an output

ps -aux | grep ssh


grep linda * 2> /dev/null // search for linda, in all files, in the current dire
ctory
// it will show file names & the line containing "lind
a"
grep '\<root\>' * 2> /dev/null // search for "root", in all files, in the current d
irectory
grep -l linda * 2> /dev/null // l : less, show list of files only
grep -i linda * // -i : ignor case
grep -A5 linda /etc/passwd // print the following 5 lines after finding linda //
useful in logs
grep -B5 linda /etc/passwd // print the previous 5 lines before finding linda //
useful in logs
grep -R root /etc // Recursively find the word root
grep -Rl root /etc 2> /dev/null | less // l : less

egrep '^[[:alpha:]]{3}$' * 2> /dev/null // egrep all lines that are exactly 3
alphabets
grep '^...$' * 2> /dev/null // grep all lines that are exactly 3 c
haracters
$ grep '^endif$' * 2> /dev/null // find exactlty "endif"
grep '\<endif\>' * 2> /dev/null // find exactlty "endif"

4.3 Regular Expressions


globbing : applies to file name

Regular Expression : applies to search patterns for a text inside


a file

Lesson 04 Text Files 2


grep 'a*' a* // first 'a*' is Regular expression, to search for the pattern 'a*'
inside the file
// second a* is globbing, to search for files with a*

Regular expressions are used with:

grep

vim

awk

sed

POSIX:
The Portable Operating System Interface is a family of standards specified by the
IEEE Computer Society for maintaining compatibility between operating systems.

The goal of POSIX is to ease the task of cross-platform software development by
establishing a set of guidelines for operating system vendors to follow. Ideally, a
developer should have to write a program only once to run on all POSIX-compliant
systems.

man 7 regex // Regular Expression

$ cat regtext
b
bt
bit
bite
boot
bloat
boat

Lesson 04 Text Files 3


Regular expression must be
between single quotes ' ',
'b.*t'

The period . matches any single character.

Anchoring
The caret ^ and the dollar sign $ are meta-characters that respectively match the
empty string at the beginning and end of a
line.
The Backslash Character and Special Expressions
The symbols \< and \> respectively match the empty string at the beginning and end
of a word.
The symbol \b matches the empty string at the edge of a word, and \B
matches the empty string provided it's not at the edge of a word.

The symbol \w is a synonym for [[:alnum:]] and \W is a synonym for [^[:alnum:]].


Repetition
A regular expression may be followed by one of several repetition operators:
? The preceding item is optional and matched at most once.
* The preceding item will be matched zero or more times.
+ The preceding item will be matched one or more times.
{n} The preceding item is matched exactly n times.
{n,} The preceding item is matched n or more times.
{,m} The preceding item is matched at most m times. This is a GNU extension.
{n,m} The preceding item is matched at least n times, but not more than m times.

* is a repetition operator for


zero or more

Lesson 04 Text Files 4


? is an Extended Regular
Expression. ? did not work * is a repetition operator for
with grep, it works with egrep. zero or more

* is a repetition operator for


zero or more. boat does not
match, because * means that
"o" (the preceding character)
is repeated zero or more
times.

4.4 awk
awk is specialized in data extraction and reporting (could be sent to a printer).

$ awk -F : '/linda/ { print $4 }' /etc/passwd // -F : the delimiter, $4 is the


field number 4
1001

awk -F : '{ print $NF }' /etc/passwd // $NF number of fields, print the last fie
ld in the line.
// useful when number of fields are not the sam
e in all lines.
/bin/bash
/sbin/nologin
/sbin/nologin
/sbin/nologin
/sbin/nologin
/bin/sync
/sbin/shutdown
/sbin/halt
/sbin/nologin

// print the last column of ps -aux


$ ps -aux | awk '{ print $NF }'

$ ls -l /etc | awk '/pass/ { print }' | less


-rw-r--r--. 1 root root 2598 Dec 6 16:04 passwd
-rw-r--r--. 1 root root 2557 Dec 4 23:41 passwd-
(END)

Lesson 04 Text Files 5


$ ls -l /etc | grep pass
-rw-r--r--. 1 root root 2598 Dec 6 16:04 passwd
-rw-r--r--. 1 root root 2557 Dec 4 23:41 passwd-

4.5 sed (Stream Editor)


$ cat sedfile
one
two
three
four
five

$ sed -n 4p sedfile // -n 4p print line number 4


four

$ sed -i s/four/FOUR/g sedfile // -i write directly to the file, // s substi


tute and replace
// without -i it will write to the stdout
$ cat sedfile
one
two
three
FOUR
five

$
$ sed -n 4p sedfile
FOUR

$ sed -i -e '2d' sedfile // -i modify the file, 2d delete line number 2


$ cat sedfile
one
three
FOUR
five

Lesson 04 Text Files 6

You might also like