0% found this document useful (0 votes)
19 views55 pages

Linuxsuite 6

Uploaded by

alyssayoum04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views55 pages

Linuxsuite 6

Uploaded by

alyssayoum04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Fundamental UNIX

Lecture 7: Unix — text processing(2)

Dr Mahamed Lamine Guindo


Assistant Professor
• Ignoring Leading Blanks with -b
zebra
apple
banana sort -b indented.txt
orange
grape -b ignores leading blanks when sorting.

Case-Insensitive Sorting with -f


Apple
banana
Cherry
apple sort -f mixed_case.txt
Banana
cherry
Assignment
• 102 Jane 75000
• 103 Bob 62000
• 101 John 55000
• 104 Emma 48000
• 102 Jane 55000
• sort first by the name column alphabetically and then by the salary
numerically
• To save sorted output directly into a new file:
• sort -n numbers.txt -o sorted_numbers.txt
• echo "Alice" > names.txt
• echo "Charlie" >> names.txt
• echo “bob" >> names.txt
• echo "David" >> names.txt
• echo "Eve" >> names.txt
sort names.txt

sort -f names.txt
• echo "10" > numbers.txt
• echo "3" >> numbers.txt
• echo "25" >> numbers.txt
• echo "1" >> numbers.txt
• echo "100" >> numbers.txt
sort -n numbers.txt
• Reverse order
sort -r names.txt
• uniq [OPTION] [INPUT [OUTPUT]]
Key Options
•-c : Prefixes each line with the number of
occurrences.
•-d : Only displays duplicate lines.
•-u : Only displays unique lines.
•-i : Ignores case differences when comparing lines.
•-f : Skips a specified number of fields before
determining uniqueness.
•-s : Skips a specified number of characters before
determining uniqueness.
•-w : Limits comparison to a specific number of
characters in each line.
•--help : Displays help information about the uniq
command.
Suppose fruit.txt
• apple
• apple
• banana
• banana
• apple
• cherry
• cherry
• cherry
• uniq fruits.txt
• uniq removes duplicates only if they are adjacent.

To count how many times each line occurs in the file:


uniq -c fruits.txt
To display only the lines that are duplicated:
uniq -d fruits.txt
To show lines that appear only once:
uniq -u fruits.txt
• To treat lines as duplicates regardless of case:
uniq -i mixed_case_fruits.txt
• To skip the first field and compare based on the fruit name only:
uniq -f 1 data.txt
• To skip the first three characters in each line and compare only by the
rest of the line:
uniq -s 3 prefixed_fruits.txt
• Limiting Comparison to a Specific Number of Characters
applepie
apple
banana
bananabread
cherry

uniq -w 5 short_words.txt
To save the results of removing duplicate lines in a new file unique_fruits.txt:
uniq fruits.txt unique_fruits.txt
• echo "Apple" > fruits.txt
• echo "Banana" >> fruits.txt
• echo "Banana" >> fruits.txt
• echo "Cherry" >> fruits.txt
• echo "Cherry" >> fruits.txt
• echo "Cherry" >> fruits.txt
• echo "Grapes" >> fruits.txt
• echo "Lemon" >> fruits.txt
uniq fruits.txt
• echo "Apple" > fruits.txt
• echo "Banana" >> fruits.txt
• echo "Cherry" >> fruits.txt
• echo "Banana" >> fruits.txt
• echo "Cherry" >> fruits.txt
• uniq removes only consecutive duplicate lines. If you want to count
the number of occurrences of each unique line, you can use the -c
option:
• Uniq -c fruits.txt

• Other alternative
• sort fruits.txt | uniq
• Display Duplicate Lines Only
uniq –d
• display only the lines that occur exactly once:
uniq –u
• grep -i Case-Insensitive Search
grep -i "apple" fruits.txt
grep -v Invert Match
grep -v "banana" fruits.txt
grep -f Search with Patterns from a File
• # Create patterns.txt with patterns
• echo "apple" > patterns.txt
• echo "banana" >> patterns.txt

• # Search using patterns from patterns.txt


• grep -f patterns.txt fruits.txt
• grep -E - Extended Regular Expressions
• grep -E "apple|orange" fruits.txt
• grep -o - Display Only Matching Part

• grep -E "apple(pie|juice)" menu.txt


• grep -E "^apple" fruits.txt (starting by apple)
• grep -E "[aeiou]" text.txt (starting by vowels)
The sed (stream editor) command is a powerful tool
for text manipulation and transformation
• SED in Unix: (Stream Editor) is a powerful text processing tool in Unix
and Unix-like systems used to perform basic text transformations on
an input stream (a file or input from a pipeline).
• Unlike normal text editors, sed doesn't modify files interactively.
Instead, it reads input line by line, applies the specified commands,
and then outputs the modified text.
• It is widely used in shell scripting and automation for tasks like find-
and-replace, insertion, deletion, and more.
The general structure of a sed comma Basic Structure of a sed Command
nd is:

sed 'command' file

•command: A sed command or a series of commands.


•file: The file on which the command is applied.

cat file | sed 'command'


You can also pass input through pipes:

cat file | sed 'command'

Common sed Commands and Operations


1. Substitution (s) Command
The s command in sed is used to substitute a string or pattern with another string. The basic syntax is:

sed 's/pattern/replacement/' file


• sed 's/pattern/replacement/' file
•pattern: The regular expression to match.
•replacement: The string to replace the matched pattern.
Example
sed 's/apple/orange/' file.txt

To replace all occurrences of the pattern, add the g (global) flag:

sed 's/apple/orange/g' file.txt


You can also limit the replacement to a specific line or range of lines:

sed '2s/apple/orange/' file.txt


2. Deletion (d) Command
The d command is used to delete lines.
Example: Delete the 3rd line:

sed '3d' file.txt

Delete lines from 3 to 5:

sed '3,5d' file.txt

Delete all lines that match a pattern:

sed '/pattern/d' file.txt


• . Insertion (i)
• Command
• The i command is used to insert a line of text before a given line.
• Example: Insert a line before line 3:
• sed '3i\This is a new line’ file.txt

. Append (a) Command


The a command appends a line of text after a given line.
Example: Append a line after line 4
sed '4a\This is an appended line' file.txt
5. Change (c) Command
The c command is used to replace the entire content of a line.
Example: Change line 5 to a new line:

sed '5c\This is the new line' file.txt

6. Print (p) Command


The p command prints specific lines or a range of lines. Normally, sed prints each line once, but with the p
command, you can print lines explicitly.
Example: Print only lines that match a pattern:

sed -n '/pattern/p' file.txt

Print lines 1 to 3:

sed -n '1,3p' file.txt


Multiple Commands
You can chain multiple sed commands by separating them with ;.
Example: Replace "apple" with "orange" and delete the 2nd line:

sed 's/apple/orange/; 2d' file.txt

Alternatively, you can use the -e option to specify multiple commands:

sed -e 's/apple/orange/' -e '2d' file.txt

Read (r) and Write (w) Commands


•r command: Reads the contents of a file and inserts it into the output.
Example: Insert the contents of file2.txt after line 3:
sed '3r file2.txt' file.txt
•w command: Writes selected lines to a file.
Example: Write lines matching a pattern to another file:

sed -n '/pattern/w output.txt' file.txt


Transform (y) Command
The y command works like tr, converting one set of characters to another.
Example: Convert all lowercase vowels to uppercase

sed 'y/aeiou/AEIOU/' file.txt


Assignment
• Line 1: The quick brown fox jumps over the lazy dog.
• Line 2: SED is a powerful tool for text manipulation.
• Line 3: This is a sample file for testing sed commands.
• Line 4: Another line with simple content.
• Line 5: Final line in the file.
Substitute (s) Command: Replace "fox" with "cat" in the first line.

Substitute All Occurrences (g flag): Replace all occurrences of "line" with "sentence

Delete (d) Command: Delete the third line.

Delete Lines in a Range: Delete lines 2 through 4.

Insert (i) Command: Insert a new line before the second line.

Append (a) Command: Append a new line after the fourth line.

Change (c) Command: Change the content of the fifth line.

Print (p) Command: Print only the second and fourth lines.

Read (r) Command: Insert the content of another file (file2.txt) after line 3.
• sed 's/apple/orange/g' fruits.txt

• Delete line
• sed '/banana/d' fruits.txt
• Add or Append Text:
• sed 's/apple/Delicious &/g' fruits.txt

• Insert Text:
For example, to insert "This is the header" before the first line
• sed '1i\This is the header' file.txt
• Print Specific Lines:
sed -n '5,10p' file.txt

Create a Sed script


# myscript.sed
s/apple/orange/g
s/banana/pear/g

sed -f myscript.sed fruits.txt


Student.txt Score.txt
1 John 1 85
2 Alice 2 90
3 Bob 3 78

join students.txt scores.txt

Options with join:


•-1 and -2: Specify which column to use for the join in the first and second file respectively.

join -1 1 -2 1 students.txt scores.txt

-t: Specify a delimiter (default is a space or tab).

join -t ',' file1.csv file2.csv


employee.txt: • salary.txt:
1 John • 2 60000
2 Alice • 1 75000
3 Bob
• 3 55000
4 Eve
• 5 45000
Paste command
• The paste command in Unix is used to merge lines from two or more files
side by side
• File1.txt
• Apple
• Banana
• Cherry
• File2.txt
•1
•2
•3
• paste file1.txt file2.txt
• paste -d',' file1.txt file2.txt
• echo "Grape" | paste file1.txt -
Comm command
• Using comm to Compare Files
• comm file1.txt file2.txt

• comm -d',' file1.txt file2.txt


Using rev to Reverse Characters
• rev text.txt
• The tr command in Unix is a powerful tool used for translating,
deleting, or compressing characters in a text stream. It works by
reading from standard input and writing to standard output, replacing
or modifying characters based on the specified parameters.
tr [options] SET1 [SET2]
Common Options:
•-d : Delete characters in SET1
•-s : Squeeze repeated occurrences of characters in SET1
•-c : Complement the characters in SET1 (i.e., include all characters except those specified in SET1)
•-t : Truncate SET1 to the length of SET2
• Basic Translation: Translate lowercase letters to uppercase:
• echo "hello world" | tr 'a-z' 'A-Z'
• Deleting Characters: Delete all occurrences of a specific character
(e.g., delete all digits):
• echo "hello123 world456" | tr -d '0-9'
• Squeezing Repeated Characters: Replace multiple spaces with a single
space:
echo "hello world" | tr -s ' '
• Complement Option (-c): Replace all non-alphabet characters with a
space:
echo "Hello! 123 World?" | tr -c 'A-Za-z' ' '
• Combining Options: Remove all non-digit characters and compress
repeated digits into a single digit:
• echo "phone number: 123-456-7890" | tr -cd '0-9' | tr -s '0-9'
• Character Ranges and Classes:
• Character Ranges: Translate characters using ranges, such as a-z or 0-
9.
• Character Classes: Classes like [:digit:], [:alpha:], [:lower:], and
[:upper:]
For example, translating all lowercase letters to uppercase using a
character class:
• echo "hello world" | tr '[:lower:]' '[:upper:]'
• Removing Non-Printable Characters: To clean up a file and remove all
non-printable characters:
cat filename | tr -cd '[:print:]'
• Advanced Use Cases:
• ROT13 Encryption: The tr command can be used to implement simple
ROT13 encryption:
echo "hello world" | tr 'a-zA-Z' 'n-za-mN-ZA-M'
Translate Tabs to Spaces: Replace all tab characters with four spaces:
cat file.txt | tr '\t' ' '
• Remove Vowels: Remove all vowels from a string:
echo "Remove vowels from this sentence" | tr -d 'aeiouAEIOU'
• Decryption for More Complex AlgorithmsFor more advanced
encryption algorithms (e.g., AES, RSA, etc.), decryption would
typically involve using a specific key and a decryption algorithm. Unix
utilities like openssl are more suitable for these:
• Example: Encrypt and Decrypt with openssl
• echo "hello world" | openssl enc -aes-256-cbc -e -base64 -pass
pass:mysecretkey
• The awk command in Unix is a powerful text-processing tool and
programming language.
• It’s used for pattern scanning, processing, and reporting on text. You
can use awk to perform simple tasks like filtering columns or
advanced tasks like complex text transformations and calculations.
• Syntax
• awk 'pattern { action }' file

Basic Structure:

•Pattern: The condition to match (optional).


•Action: Commands to execute on matching lines (optional). If no pattern is provided, actions
are applied to every line.

Key Concepts:
•Fields: Columns of each line, accessed using $1, $2, … $NF (last field).
•Records: Each line is a record, referred to as $0.
Examples and Use Cases:

1. Print Specific Columns:


If you have a file data.txt with the following content:

1 Alice 25
2 Bob 30
3 Charlie 22

awk '{ print $1, $3 }' data.txt

2.Filter Rows Based on a Condition:


Print only lines where the third column (age) is greater
than 24:

awk '$3 > 24' data.txt


3. Using BEGIN and END Blocks:
•BEGIN: Executes before reading input lines.
•END: Executes after reading all input lines.
Print a header, data, and footer:

awk 'BEGIN { print "ID Name Age" } { print $1, $2, $3 } END { print "End of file" }' data.txt

4. Field Separator:
If the input uses a delimiter other than space or tab, use the -F option to specify the field separator.

1,Alice,25
2,Bob,30
3,Charlie,22

awk -F',' '{ print $2 }' data.csv


5. Mathematical Calculations:
Suppose you have a file marks.txt with student marks:

John 78 85 92
Alice 88 90 85
Bob 65 70 75

Calculate and print the average marks for each student:

awk '{ avg = ($2 + $3 + $4) / 3; print $1, avg }' marks.txt

6. Print Specific Lines or Range of Lines:


Print lines from 2 to 3:

awk 'NR>=2 && NR<=3' data.txt

7.Pattern Matching with Regular Expressions:


Find lines containing the word "Alice":

awk '/Alice/' data.txt


Built-in Variables in awk:
•NR: Line number of the current record.
•NF: Number of fields in the current record.
•FS: Field separator (default is whitespace).
•OFS: Output field separator.

Print line numbers along with each line:


awk '{ print NR, $0 }' data.txt

Print all fields, separating them with a comma:

awk 'BEGIN { OFS="," } { print $1, $2, $3 }' data.txt


Advanced Examples:

10. Count Word Frequency:


Suppose you have a file words.txt:
apple banana apple orange banana apple
Count the occurrences of each word:

awk '{ for(i=1; i<=NF; i++) count[$i]++ } END { for(word in count) print word, count[word] }' words.txt

11. Sum Up Values in a Column:


If you have a file expenses.txt:
Food 200
Rent 800
Transport 100
Food 150
Calculate the total amount spent on each category:

awk '{ expenses[$1] += $2 } END { for(item in expenses) print item, expenses[item] }' expenses.txt

You might also like