0% found this document useful (0 votes)
16 views14 pages

Linux Basic - Introduction To Cut, Awk, Grep and Sed

Uploaded by

chappagirija
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views14 pages

Linux Basic - Introduction To Cut, Awk, Grep and Sed

Uploaded by

chappagirija
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Introduction to cut command

The cut command is a powerful tool in Unix/Linux systems used for


extracting sections from each line of a file or standard input. It is commonly
used for processing text data, such as parsing CSV files or other delimited
data.
Basic Syntax
cut [OPTION]... [FILE]...
Common Options
 -b: Select only these bytes.
 -c: Select only these characters.
 -d: Use a specific delimiter.
 -f: Select only these fields.
 --complement: Complement the selection.
 --output-delimiter: Use the specified string as the output delimiter.
Detailed Options and Examples
1. Selecting Bytes
The -b option allows you to specify a list of bytes to extract.
Example: Extract the 1st, 2nd, and 3rd bytes from each line of file.txt:
cut -b 1-3 file.txt
2. Selecting Characters
The -c option allows you to specify a list of characters to extract.
Example: Extract the 1st, 3rd, and 5th characters from each line of file.txt:
cut -c 1,3,5 file.txt
3. Selecting Fields with a Delimiter
The -d option specifies the delimiter to use, and the -f option specifies the
fields to extract.
Example: Given file.txt:
1,101,[email protected]
2,102,[email protected]
3,103,[email protected]
4,104,[email protected]
Extract the second field (profileid) from each line, using a comma as the
delimiter:
cut -d, -f2 file.txt
Output:
101
102
103
104
Extract the third field (email) from each line:
cut -d, -f3 file.txt
Output:
[email protected]
[email protected]
[email protected]
[email protected]
4. Complement the Selection
The --complement option inverts the selection, outputting all fields except
the ones specified.
Example: Extract all fields except the second field:
cut -d, -f2 --complement file.txt
Output:
1,[email protected] 2,[email protected] 3,[email protected]
4,[email protected]
5. Changing the Output Delimiter
The -output-delimiter option changes the delimiter used in the output.
Example: Change the output delimiter to a semicolon:
cut -d, -f1,3 --output-delimiter=';' file.txt
Output:
1;[email protected]
2;[email protected]
3;[email protected]
4;[email protected]
Combining cut with Other Commands
1. Extracting Emails from a File and Filtering Another File
file1.txt:
1,101,[email protected] 2,102,[email protected]
3,103,[email protected] 4,104,[email protected]
file2.txt:
5,201,[email protected] 6,202,[email protected]
Objective: Remove lines in file1.txt where the email matches any email in
file2.txt.
1. Extract emails from file2.txt:
cut -d, -f3 file2.txt > emails_to_remove.txt
2. Use fgrep to filter out lines in file1.txt:
fgrep -v -f emails_to_remove.txt file1.txt
Output:
1,101,[email protected] 4,104,[email protected]
Advanced Examples
1. Extract Specific Fields and Use Custom Output Delimiter
Example: Extract the first and third fields from a CSV file and use a pipe (|)
as the output delimiter:
cut -d, -f1,3 --output-delimiter='|' file.txt
2. Extract Fields from a Tab-Delimited File
If the file is tab-delimited, specify the delimiter as a tab ($'\t'):
Example: Extract the second field from a tab-delimited file:
cut -d$'\t' -f2 file.txt
Handling Edge Cases
 Empty Fields: cut does not handle empty fields well when the
delimiter is repeated consecutively. Use awk for more complex field
extraction in such cases.
 Multi-character Delimiters: cut only supports single-character
delimiters. Use awk or sed for multi-character delimiters.
Summary
The cut command is a versatile tool for extracting specific sections from text
data. Its primary options (-b, -c, -d, -f) allow for precise selection of bytes,
characters, and fields. Combined with other commands, cut can be part of
powerful text processing pipelines.

Introduction to awk
awk is a powerful programming language and command-line utility in
Unix/Linux for text processing and data extraction. It is particularly well-
suited for handling structured data and performing pattern scanning and
processing.
Basic Syntax
awk 'pattern { action }' input-file
 pattern: Specifies a condition; actions are performed on lines that
match the pattern.
 action: Specifies what to do with the matching lines.
 input-file: The file to be processed.
Commonly Used Options
 -F fs: Specify the field separator (fs) to use.
 -v var=value: Assign a value to a variable.
 -f program-file: Read the awk program from a file.
Examples
1. Basic Example
Print the third field of each line in a file:
awk '{ print $3 }' file.txt
2. Specifying a Field Separator
Print the second field from a comma-separated file:
awk -F, '{ print $2 }' file.csv
3. Cutting Fields Using a Delimiter
Print the first and third fields from a colon-separated file:
awk -F: '{ print $1, $3 }' file.txt
4. Performing Calculations
Print the sum of the first and second fields:
awk '{ print $1 + $2 }' file.txt
5. Using Patterns
Print lines where the second field is greater than 100:
awk '$2 > 100' file.txt
6. Using Variables
Pass a variable to awk:
awk -v threshold=100 '$2 > threshold' file.txt
7. BEGIN and END Blocks
The BEGIN and END blocks allow you to perform actions before and after
processing the input file, respectively.
Print a header before processing the file and a footer after processing the
file:
awk 'BEGIN { print "Header" } { print $0 } END { print "Footer" }' file.txt
8. Print the Last and Second-to-Last Fields
Print the last field ($NF) and the second-to-last field ($(NF-1)) of each line:
awk '{ print $NF, $(NF-1) }' file.txt
9. Output Field Separator (OFS) Example
Change the output field separator to a comma:
awk 'BEGIN { OFS="," } { print $1, $2, $3 }' file.txt
10. Setting Both Input and Output Field Separators (IFS and OFS)
Convert a space-separated file to a comma-separated file:
awk 'BEGIN { FS=" "; OFS="," } { print $1, $2, $3 }' file.txt
Explanation:
 FS=" ": Set the input field separator to a space.
 OFS=",": Set the output field separator to a comma.
11. Remove the lines in file1.txt where the email matches any email in
file2.txt.
file1.txt:
1,101,[email protected]
2,102,[email protected]
3,103,[email protected]
4,104,[email protected]
file2.txt:
5,201,[email protected]
6,202,[email protected]

Solution using awk:


1. Store emails from file2.txt in an associative array.
2. Check each email in file1.txt against this array.
3. Print the lines from file1.txt where the email does not exist in
file2.txt.
Here's the awk command to achieve this:
awk -F, 'NR==FNR { emails[$3]; next } !($3 in emails)' file2.txt file1.txt
Explanation:
 -F,: Sets the field separator to a comma.
 NR==FNR { emails[$3]; next }: While reading the first file
(file2.txt), store each email (the third field, $3) in an associative array
emails and skip to the next record.
 !($3 in emails): For the second file (file1.txt), check if the email (the
third field, $3) is not in the emails array. If it's not, print the line.
Detailed Steps:
1. Read file2.txt: For each line, store the email in the emails array.
2. Read file1.txt: For each line, check if the email is in the emails array.
If not, print the line.
Running the Command:
To see the output directly, run:
awk -F, 'NR==FNR { emails[$3]; next } !($3 in emails)' file2.txt file1.txt
Example Execution:
Given the contents of file1.txt and file2.txt, the command will produce the
following output:
1,101,[email protected] 4,104,[email protected]
Saving the Output to a File:
To save the result to a new file (filtered_file1.txt):
awk -F, 'NR==FNR { emails[$3]; next } !($3 in emails)' file2.txt file1.txt >
filtered_file1.txt
This awk command effectively filters out the lines in file1.txt where the
email is present in file2.txt.

Explanation of NR and FNR in awk


NR and FNR are built-in variables in awk that are used to keep track of
record numbers during the processing of input files.
NR (Number of Records)
 Definition: NR is the total number of records (or lines) read so far
from all input files.
 Scope: Global across all input files.
 Example: If you have two files, each with 3 lines, NR will be 1, 2, 3 for
the first file, and then 4, 5, 6 for the second file.
FNR (File Number of Records)
 Definition: FNR is the number of records read so far from the current
input file.
 Scope: Resets to 1 for each new input file.
 Example: If you have two files, each with 3 lines, FNR will be 1, 2, 3
for the first file, and then 1, 2, 3 for the second file.
Using NR==FNR
 Purpose: The condition NR==FNR is used to perform actions only
while processing the first input file.
 How it Works:
 When processing the first file, NR and FNR will be equal (e.g.,
both will be 1, 2, 3).
 When awk starts processing the second file, FNR resets to 1, but
NR continues to increment (e.g., FNR will be 1, but NR will be 4).
Example Code with Explanation
Let's revisit the command:
awk -F, 'NR==FNR { emails[$3]; next } !($3 in emails)' file2.txt file1.txt
Step-by-Step Explanation:
1. NR==FNR { emails[$3]; next }:
 Condition NR==FNR: This block executes only while processing
file2.txt.
 Action emails[$3]; next:
 emails[$3]; stores the third field (email) from file2.txt
into the array emails.
 next: This command skips the remaining actions and
moves to the next record.
 Effect: Collects all emails from file2.txt into the emails array.
2. !($3 in emails):
 This block executes for the second file (file1.txt).
 Condition !($3 in emails): Checks if the email (third field) from
file1.txt is NOT in the emails array.
 Action: Prints the line if the condition is true.
Practical Example
Given:
file1.txt:
1,101,[email protected] 2,102,[email protected]
3,103,[email protected] 4,104,[email protected]
file2.txt:
5,201,[email protected] 6,202,[email protected]
Running the Command:
awk -F, 'NR==FNR { emails[$3]; next } !($3 in emails)' file2.txt file1.txt
Execution Flow:
1. Processing file2.txt:
 Line 1: NR=1, FNR=1 -> emails["[email protected]"]
 Line 2: NR=2, FNR=2 -> emails["[email protected]"]
2. Processing file1.txt:
 Line 1: NR=3, FNR=1 -> Email "[email protected]" not in
emails -> Print line
 Line 2: NR=4, FNR=2 -> Email "[email protected]" in
emails -> Skip line
 Line 3: NR=5, FNR=3 -> Email "[email protected]" in
emails -> Skip line
 Line 4: NR=6, FNR=4 -> Email "[email protected]" not in
emails -> Print line
Output:
1,101,[email protected] 4,104,[email protected]
Conclusion
Understanding NR and FNR is crucial for using awk effectively, especially
when dealing with multiple input files. The NR==FNR condition helps you
isolate operations to the first file, enabling complex data processing tasks
like joining files or filtering records based on another file's content.

Introduction to fgrep
fgrep is a variant of the grep command in Unix/Linux used for searching
fixed string patterns in files. Unlike grep, fgrep does not interpret regular
expressions; it searches for exact string matches.
Basic Syntax
fgrep [OPTIONS] PATTERN [FILE...]
Important Options
 -v: Invert the match to select non-matching lines.
 -f FILE: Take patterns from the specified file, one per line.
 -x: Match the entire line.
 -i: Ignore case distinctions.
Using fgrep with Examples
Given two files, file1.txt and file2.txt, with the following content:
file1.txt:
1,101,[email protected]
2,102,[email protected]
3,103,[email protected]
4,104,[email protected]
file2.txt:
5,201,[email protected]
6,202,[email protected]
Objective:
Remove lines in file1.txt where the email matches any email in file2.txt.
Step-by-Step Solution
1. Extract emails from file2.txt: First, extract the emails from file2.txt
into a separate file emails_to_remove.txt.
cut -d, -f3 file2.txt > emails_to_remove.txt
Content of emails_to_remove.txt:
[email protected] [email protected]
2. Use fgrep to Filter Out Matching Emails: Use fgrep with the -v
option to remove lines in file1.txt where the email matches any email
in emails_to_remove.txt.
fgrep -v -f emails_to_remove.txt file1.txt
Explanation:
 -v: Invert the match to select lines that do not match any pattern
in emails_to_remove.txt.
 -f emails_to_remove.txt: Read the patterns (emails) from
emails_to_remove.txt.
Detailed Breakdown
1. Extracting Emails
We use the cut command to extract the third field (email) from file2.txt.
cut -d, -f3 file2.txt > emails_to_remove.txt
 -d,: Specifies the delimiter as a comma.
 -f3: Specifies the third field.
 > emails_to_remove.txt: Redirects the output to
emails_to_remove.txt.
2. Removing Matching Lines
We use fgrep to filter out the lines in file1.txt that have emails listed in
emails_to_remove.txt.
fgrep -v -f emails_to_remove.txt file1.txt
 -v: Invert the match, i.e., select lines that do not match.
 -f emails_to_remove.txt: Use the patterns from
emails_to_remove.txt.
More Information:
Command: fgrep -f emails_to_remove.txt file1.txt
functionality:
checks whether [email protected] and [email protected] (from file
emails_to_remove.txt ) is a subset of below lines
1,101,[email protected] - not matched
2,102,[email protected] - matched -> printed
3,103,[email protected] - matched -> printed
4,104,[email protected] - not matched
output:
2,102,[email protected]
3,103,[email protected]
Command: fgrep -v -f emails_to_remove.txt file1.txt
[email protected] and [email protected] is a subset of
1,101,[email protected] - not matched -> printed becaused of -v option
2,102,[email protected] - matched
3,103,[email protected] - matched
4,104,[email protected] - not matched -> printed becaused of -v option

output:
1,101,[email protected]
4,104,[email protected]
Command: fgrep -x -f emails_to_remove.txt file1.txt
[email protected] and [email protected] is matching completely with
below lines.
1,101,[email protected] - not matched
2,102,[email protected] - not matched
3,103,[email protected] - not matched
4,104,[email protected] - not matched
output: empty
Combining Commands in a Script
Here is a simple script to automate the process:
sh
Copy code
#!/bin/bash
# Extract emails from file2.txt
cut -d, -f3 file2.txt > emails_to_remove.txt
# Remove lines in file1.txt with matching emails
fgrep -v -f emails_to_remove.txt file1.txt > filtered_file1.txt
# Display the result
cat filtered_file1.txt
Example Execution
Running the script will produce the following output:
filtered_file1.txt:
1,101,[email protected] 4,104,[email protected]
Additional Examples and Options
Ignoring Case with -i
If you want to ignore case distinctions while matching:
fgrep -vi -f emails_to_remove.txt file1.txt
Matching Entire Lines with -x
To match the entire line exactly, use the -x option. This is useful if you have
patterns that should match the whole line.
fgrep -vx -f emails_to_remove.txt file1.txt
Searching Multiple Patterns from Command Line
If you need to search for multiple patterns directly from the command line:
fgrep '[email protected]' '[email protected]' file1.txt
Combining Options
You can combine multiple options to refine your search. For example, to
ignore case and invert the match:
fgrep -iv -f emails_to_remove.txt file1.txt
Conclusion
fgrep is a straightforward and powerful tool for fixed-string pattern matching
in Unix/Linux. It is particularly useful for scenarios where regular expressions
are not needed, and exact string matches are sufficient. By using options like
-v, -f, -x, and -i, you can perform a wide range of text processing tasks
efficiently. The example provided demonstrates how fgrep can be used to
filter lines based on patterns from another file, which is a common
requirement in data processing tasks.

Introduction to sed
 Definition: sed (stream editor) is a command-line utility for parsing
and transforming text. It is used for editing streams of text, such as
replacing text patterns, deleting lines, or performing text
transformations.
 Syntax: sed [options] 'script' [filename]
 Important Options:
 -e script: Add the script to the commands to be executed.
 -n: Suppress automatic printing of pattern space.
 -i: Edit files in place.
 -r: Use extended regular expressions.
 -f scriptfile: Read the sed script from the file.
 Examples:
1. Replace text in a file: sed 's/old/new/' filename
2. Delete lines matching a pattern: sed '/pattern/d' filename
3. Print specific lines using line numbers: sed -n '5,10p' filename
4. Perform multiple edits using a script file: sed -f script.sed filename
5. Edit files in place: sed -i 's/foo/bar/' filename
6. Replace using extended regular expressions: sed -r
's/\bword\b/replacement/' filename
7. Print only lines that match a pattern: sed -n '/pattern/p' filename
8. Insert text at a specific line number: sed '2i\New line' filename
9. Append text after a specific line number: sed '3a\New line' filename
10. Transform text using a sed script: sed -e 's/abc/123/' -e
's/def/456/' filename

You might also like