Linux Basic - Introduction To Cut, Awk, Grep and Sed
Linux Basic - Introduction To Cut, Awk, Grep and Sed
Introduction to awk
awk is a powerful programming language and command-line utility in
Unix/Linux for text processing and data extraction. It is particularly well-
suited for handling structured data and performing pattern scanning and
processing.
Basic Syntax
awk 'pattern { action }' input-file
pattern: Specifies a condition; actions are performed on lines that
match the pattern.
action: Specifies what to do with the matching lines.
input-file: The file to be processed.
Commonly Used Options
-F fs: Specify the field separator (fs) to use.
-v var=value: Assign a value to a variable.
-f program-file: Read the awk program from a file.
Examples
1. Basic Example
Print the third field of each line in a file:
awk '{ print $3 }' file.txt
2. Specifying a Field Separator
Print the second field from a comma-separated file:
awk -F, '{ print $2 }' file.csv
3. Cutting Fields Using a Delimiter
Print the first and third fields from a colon-separated file:
awk -F: '{ print $1, $3 }' file.txt
4. Performing Calculations
Print the sum of the first and second fields:
awk '{ print $1 + $2 }' file.txt
5. Using Patterns
Print lines where the second field is greater than 100:
awk '$2 > 100' file.txt
6. Using Variables
Pass a variable to awk:
awk -v threshold=100 '$2 > threshold' file.txt
7. BEGIN and END Blocks
The BEGIN and END blocks allow you to perform actions before and after
processing the input file, respectively.
Print a header before processing the file and a footer after processing the
file:
awk 'BEGIN { print "Header" } { print $0 } END { print "Footer" }' file.txt
8. Print the Last and Second-to-Last Fields
Print the last field ($NF) and the second-to-last field ($(NF-1)) of each line:
awk '{ print $NF, $(NF-1) }' file.txt
9. Output Field Separator (OFS) Example
Change the output field separator to a comma:
awk 'BEGIN { OFS="," } { print $1, $2, $3 }' file.txt
10. Setting Both Input and Output Field Separators (IFS and OFS)
Convert a space-separated file to a comma-separated file:
awk 'BEGIN { FS=" "; OFS="," } { print $1, $2, $3 }' file.txt
Explanation:
FS=" ": Set the input field separator to a space.
OFS=",": Set the output field separator to a comma.
11. Remove the lines in file1.txt where the email matches any email in
file2.txt.
file1.txt:
1,101,[email protected]
2,102,[email protected]
3,103,[email protected]
4,104,[email protected]
file2.txt:
5,201,[email protected]
6,202,[email protected]
Introduction to fgrep
fgrep is a variant of the grep command in Unix/Linux used for searching
fixed string patterns in files. Unlike grep, fgrep does not interpret regular
expressions; it searches for exact string matches.
Basic Syntax
fgrep [OPTIONS] PATTERN [FILE...]
Important Options
-v: Invert the match to select non-matching lines.
-f FILE: Take patterns from the specified file, one per line.
-x: Match the entire line.
-i: Ignore case distinctions.
Using fgrep with Examples
Given two files, file1.txt and file2.txt, with the following content:
file1.txt:
1,101,[email protected]
2,102,[email protected]
3,103,[email protected]
4,104,[email protected]
file2.txt:
5,201,[email protected]
6,202,[email protected]
Objective:
Remove lines in file1.txt where the email matches any email in file2.txt.
Step-by-Step Solution
1. Extract emails from file2.txt: First, extract the emails from file2.txt
into a separate file emails_to_remove.txt.
cut -d, -f3 file2.txt > emails_to_remove.txt
Content of emails_to_remove.txt:
[email protected] [email protected]
2. Use fgrep to Filter Out Matching Emails: Use fgrep with the -v
option to remove lines in file1.txt where the email matches any email
in emails_to_remove.txt.
fgrep -v -f emails_to_remove.txt file1.txt
Explanation:
-v: Invert the match to select lines that do not match any pattern
in emails_to_remove.txt.
-f emails_to_remove.txt: Read the patterns (emails) from
emails_to_remove.txt.
Detailed Breakdown
1. Extracting Emails
We use the cut command to extract the third field (email) from file2.txt.
cut -d, -f3 file2.txt > emails_to_remove.txt
-d,: Specifies the delimiter as a comma.
-f3: Specifies the third field.
> emails_to_remove.txt: Redirects the output to
emails_to_remove.txt.
2. Removing Matching Lines
We use fgrep to filter out the lines in file1.txt that have emails listed in
emails_to_remove.txt.
fgrep -v -f emails_to_remove.txt file1.txt
-v: Invert the match, i.e., select lines that do not match.
-f emails_to_remove.txt: Use the patterns from
emails_to_remove.txt.
More Information:
Command: fgrep -f emails_to_remove.txt file1.txt
functionality:
checks whether [email protected] and [email protected] (from file
emails_to_remove.txt ) is a subset of below lines
1,101,[email protected] - not matched
2,102,[email protected] - matched -> printed
3,103,[email protected] - matched -> printed
4,104,[email protected] - not matched
output:
2,102,[email protected]
3,103,[email protected]
Command: fgrep -v -f emails_to_remove.txt file1.txt
[email protected] and [email protected] is a subset of
1,101,[email protected] - not matched -> printed becaused of -v option
2,102,[email protected] - matched
3,103,[email protected] - matched
4,104,[email protected] - not matched -> printed becaused of -v option
output:
1,101,[email protected]
4,104,[email protected]
Command: fgrep -x -f emails_to_remove.txt file1.txt
[email protected] and [email protected] is matching completely with
below lines.
1,101,[email protected] - not matched
2,102,[email protected] - not matched
3,103,[email protected] - not matched
4,104,[email protected] - not matched
output: empty
Combining Commands in a Script
Here is a simple script to automate the process:
sh
Copy code
#!/bin/bash
# Extract emails from file2.txt
cut -d, -f3 file2.txt > emails_to_remove.txt
# Remove lines in file1.txt with matching emails
fgrep -v -f emails_to_remove.txt file1.txt > filtered_file1.txt
# Display the result
cat filtered_file1.txt
Example Execution
Running the script will produce the following output:
filtered_file1.txt:
1,101,[email protected] 4,104,[email protected]
Additional Examples and Options
Ignoring Case with -i
If you want to ignore case distinctions while matching:
fgrep -vi -f emails_to_remove.txt file1.txt
Matching Entire Lines with -x
To match the entire line exactly, use the -x option. This is useful if you have
patterns that should match the whole line.
fgrep -vx -f emails_to_remove.txt file1.txt
Searching Multiple Patterns from Command Line
If you need to search for multiple patterns directly from the command line:
fgrep '[email protected]' '[email protected]' file1.txt
Combining Options
You can combine multiple options to refine your search. For example, to
ignore case and invert the match:
fgrep -iv -f emails_to_remove.txt file1.txt
Conclusion
fgrep is a straightforward and powerful tool for fixed-string pattern matching
in Unix/Linux. It is particularly useful for scenarios where regular expressions
are not needed, and exact string matches are sufficient. By using options like
-v, -f, -x, and -i, you can perform a wide range of text processing tasks
efficiently. The example provided demonstrates how fgrep can be used to
filter lines based on patterns from another file, which is a common
requirement in data processing tasks.
Introduction to sed
Definition: sed (stream editor) is a command-line utility for parsing
and transforming text. It is used for editing streams of text, such as
replacing text patterns, deleting lines, or performing text
transformations.
Syntax: sed [options] 'script' [filename]
Important Options:
-e script: Add the script to the commands to be executed.
-n: Suppress automatic printing of pattern space.
-i: Edit files in place.
-r: Use extended regular expressions.
-f scriptfile: Read the sed script from the file.
Examples:
1. Replace text in a file: sed 's/old/new/' filename
2. Delete lines matching a pattern: sed '/pattern/d' filename
3. Print specific lines using line numbers: sed -n '5,10p' filename
4. Perform multiple edits using a script file: sed -f script.sed filename
5. Edit files in place: sed -i 's/foo/bar/' filename
6. Replace using extended regular expressions: sed -r
's/\bword\b/replacement/' filename
7. Print only lines that match a pattern: sed -n '/pattern/p' filename
8. Insert text at a specific line number: sed '2i\New line' filename
9. Append text after a specific line number: sed '3a\New line' filename
10. Transform text using a sed script: sed -e 's/abc/123/' -e
's/def/456/' filename