0% found this document useful (0 votes)
27 views4 pages

Lab 03

Uploaded by

ixvvy1012
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views4 pages

Lab 03

Uploaded by

ixvvy1012
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Qatar University CMPS 101: Intro.

to CS
College of Engineering Lab. Section
Department of Computer Science & Engineering Fall 2024

Linux Lab#03: The sed Command


Obj ectives

• Filters with the sed commands


• Using the sed command to convert fastq to fasta text file

Introduction

The sed command is a powerful utility in Linux systems used for parsing and transforming text. In the
context of bioinformatics, sed is often used for text manipulation tasks such as filtering, formatting, and
summarizing large sequence data files.

A regular expression (often abbreviated as regex) is a sequence of characters that defines a search
pattern. Regular expressions are used for string matching, searching, and manipulating text. They are
commonly used in programming, text editors, and tools like egrep, sed, or awk.

RegEx Syntax

1. Enclose within single quotation


2. Use the forward slash as delimiter
3. Use the backslash to configure a special character
4. Use; d (delete), s (search-and-replace), a (add), and p (print) operation
5. Use any of the following metacharacters. If it is non-special, then add the escape (backslash)
character.

Metacharacters Special? Description


1 . Yes Matches with any single character
2 ? No Matches 0 or 1 times only
3 * Yes Matches 0 or more times
4 + No Matches 1 or more times
5 ^ Yes matches the beginning of the line.
6 $ Yes matches the end of the line.
7 {N} No Matches exactly N times
8 [abc] Yes Matches only of the characters
9 | No Matches either/or

Preparation

1. Create a new directory, name as lab03 → mkdir lab03


2. With the cd (change directory) command, switch to this new directory →cd lab03
3. With the pwd (present working directory) command, check that you are in /lab03 directory. It
should appear at the end of the command prompt
4. Execute the tree command. The lab03 directory should be empty.
Page 1 of 4
CMPS 101 / Lab Section / Fall 2024 / © Amelle Bedair
Create Sample.txt Dat a File

1. Run Vim sample.txt command


2. Enter (copy-and-paste) the following text. Save and close

mahmd
mahmod
mahmood
mahmoood
mahmooood
mahammed
mahmo2d
mahmoud
mahm00d

memoo d
meh mod
Mehmood
mehMood
3. Run the cat with -n option to view the contents of the text file with numbered lines. You should
have 14 lines.

Task 1

Run the sed command -n option and p subcommand to get the following;

1. Print only lines that contain the letter "M" (in upper-case).
2. Print only lines that start with the letter "M".
3. Print only lines that end with the pattern "ed".
4. Print all lines with only one letter between ‘mahm’ and ‘d’
5. Print all lines with only one letter between ‘m’ and ‘hmo’
6. Print all lines with only two letters between ‘mahm’ and ‘d’
7. Print all lines with only three letters between ‘mahm’ and ‘d’
8. Print all lines with any single character in the position indicated by the dot
9. Print all lines with any three characters between ‘m’ and ‘d’
10. Print all lines that have either 'o' or 'u' between "mo" and 'd'
11. Print all lines that have either 'mo' or 'ud' (or both). Try each of the following commands and
write down your observation. sed -n ' / m o \ | u d / p ' smaple.txt
12. Print all lines with only two ‘o’ characters between “hm“ and “d”. Do you need the escape
character?
13. Print all lines with only two or three ‘o’ characters between “hm“ and “d”. Do you need the
escape character?
14. Print all lines with zero or one ‘o’ character between “mahm“ and “d”. Do you need the escape
character?
15. Print all lines with one or more ‘o’ characters between “mahm“ and “d”. Do you need the escape
character?
16. Print all lines with zero or more ‘o’ between “mahm“ and “d”. Do you need the escape character?
Observation: Because, to sed, the * is a special

Page 2 of 4
CMPS 101 / Lab Section / Fall 2024 / © Amelle Bedair
Task 2

1. Use the Vim to add a new line. Then type mehmo|ud. Save and close.
2. Print all lines with two or more ‘o’ characters between “hm“ and “d”. Do you need the escape
character?

Command Output Explanation


sed -n '/mo|ud/p' sample.txt
sed -n '/mo\|ud/p' sample.txt

Hint: Use the pipe character (|) to represent the “or”. However, since you want sed to consider
this as a special character and not a normal character, then precede the pipe character with the
escape character; i.e. the backslash.

Task 3

Run the sed command with s (substitute) operation to get each of the the following;

1. Replace all 'a' with '@' character. Try executing each of the following commands and observe the
changes on the string of " mahammed", i.e. the 6th line.

a) sed 's/a/@/' sample.txt


b) sed 's/a/@/g' sample.txt

2. Substitute multiple 'o' characters with only one 'o'


3. Remove all spaces; i.e. replace all spaces with nothing
4. Replace all consonants (not vowels) with a dot (.).

Task 4

What is the regex which joins all lines? Run this command on the sample.txt file and observe the output.

Note: The 's/\n//g' does not work because sed operates on one line at a time and the newline (\n)
character is not part of the pattern space, i.e., sed sees lines individually, without the newline character
included.

Task 5

Run the sed command d (delete) operation to get the following;

1. Delete all lines with numerical digits


2. Delete all empty lines

Task 6

Run the sed command a (add) operation to add ‘============’ below the line containing digits

Page 3 of 4
CMPS 101 / Lab Section / Fall 2024 / © Amelle Bedair
Part 2: FASTQ to FASTA

In Vim, copy-and-paste the following sequence. Save as dna.fstq


@fqlib5:334:VVDJXFE:4:5:7053:1012/1
CCTTTTCCCGCAGTCGTCAGCAGTAAGTGTGCGACCGGTAGTTCAAAAGGGGAATATCACCCGCTATTTTGCGAATACTAGAGCC
TCGGTTCACGCAAGCA
+
GFGDDB@CJCHDBDBDB@JACCIE@J@GD@FHFADAIEA@@ABBEADHBFH@CIEFBBHGCHEJHEAIHAAIHBGEH@EDGAEEC
@FGEABAHE@FAI@IC
@fqlib5:334:VVDJXFE:7:33:2310:7985/1
ACGTGGCCGTCCTTTTGCCAGATATCGGTAAGAGAGTTCTAGCTAAGATAATATCAATCCGCGAATGTCAGAGGGAGTGTTTCCC
TTCCGGGGAAGCAAAT
+
GDGE@DBJJCBHCGFEBHDGEI@JA@GHEHA@CCJGCGGDJGBACIBGECDDBGBHI@GBAEBBJD@@BAFBHAHEHJCBGFGCA
@GJ@IAAIJDHG@IDC

Requirements

1. On the 1st line, replace the @ with >


2. Delete the quality line
3. Delete the plus line

Note: These curly braces enclose a group of actions to apply to lines that match the pattern. Use the ; as
command separator.

Page 4 of 4
CMPS 101 / Lab Section / Fall 2024 / © Amelle Bedair

You might also like