Lab 03
Lab 03
to CS
College of Engineering Lab. Section
Department of Computer Science & Engineering Fall 2024
Introduction
The sed command is a powerful utility in Linux systems used for parsing and transforming text. In the
context of bioinformatics, sed is often used for text manipulation tasks such as filtering, formatting, and
summarizing large sequence data files.
A regular expression (often abbreviated as regex) is a sequence of characters that defines a search
pattern. Regular expressions are used for string matching, searching, and manipulating text. They are
commonly used in programming, text editors, and tools like egrep, sed, or awk.
RegEx Syntax
Preparation
mahmd
mahmod
mahmood
mahmoood
mahmooood
mahammed
mahmo2d
mahmoud
mahm00d
memoo d
meh mod
Mehmood
mehMood
3. Run the cat with -n option to view the contents of the text file with numbered lines. You should
have 14 lines.
Task 1
Run the sed command -n option and p subcommand to get the following;
1. Print only lines that contain the letter "M" (in upper-case).
2. Print only lines that start with the letter "M".
3. Print only lines that end with the pattern "ed".
4. Print all lines with only one letter between ‘mahm’ and ‘d’
5. Print all lines with only one letter between ‘m’ and ‘hmo’
6. Print all lines with only two letters between ‘mahm’ and ‘d’
7. Print all lines with only three letters between ‘mahm’ and ‘d’
8. Print all lines with any single character in the position indicated by the dot
9. Print all lines with any three characters between ‘m’ and ‘d’
10. Print all lines that have either 'o' or 'u' between "mo" and 'd'
11. Print all lines that have either 'mo' or 'ud' (or both). Try each of the following commands and
write down your observation. sed -n ' / m o \ | u d / p ' smaple.txt
12. Print all lines with only two ‘o’ characters between “hm“ and “d”. Do you need the escape
character?
13. Print all lines with only two or three ‘o’ characters between “hm“ and “d”. Do you need the
escape character?
14. Print all lines with zero or one ‘o’ character between “mahm“ and “d”. Do you need the escape
character?
15. Print all lines with one or more ‘o’ characters between “mahm“ and “d”. Do you need the escape
character?
16. Print all lines with zero or more ‘o’ between “mahm“ and “d”. Do you need the escape character?
Observation: Because, to sed, the * is a special
Page 2 of 4
CMPS 101 / Lab Section / Fall 2024 / © Amelle Bedair
Task 2
1. Use the Vim to add a new line. Then type mehmo|ud. Save and close.
2. Print all lines with two or more ‘o’ characters between “hm“ and “d”. Do you need the escape
character?
Hint: Use the pipe character (|) to represent the “or”. However, since you want sed to consider
this as a special character and not a normal character, then precede the pipe character with the
escape character; i.e. the backslash.
Task 3
Run the sed command with s (substitute) operation to get each of the the following;
1. Replace all 'a' with '@' character. Try executing each of the following commands and observe the
changes on the string of " mahammed", i.e. the 6th line.
Task 4
What is the regex which joins all lines? Run this command on the sample.txt file and observe the output.
Note: The 's/\n//g' does not work because sed operates on one line at a time and the newline (\n)
character is not part of the pattern space, i.e., sed sees lines individually, without the newline character
included.
Task 5
Task 6
Run the sed command a (add) operation to add ‘============’ below the line containing digits
Page 3 of 4
CMPS 101 / Lab Section / Fall 2024 / © Amelle Bedair
Part 2: FASTQ to FASTA
Requirements
Note: These curly braces enclose a group of actions to apply to lines that match the pattern. Use the ; as
command separator.
Page 4 of 4
CMPS 101 / Lab Section / Fall 2024 / © Amelle Bedair