0% found this document useful (0 votes)

44 views8 pages

Week 09 Tutorial Sample Answers

The document provides sample answers to tutorial questions about using the slippy command line tool to process text. It discusses using slippy with various addresses, commands, and options to print, delete, substitute, and quit on lines that match certain patterns. It also covers using multiple commands, input files, whitespace, and other slippy features.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views8 pages

Week 09 Tutorial Sample Answers

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Week 09 Tutorial Sample Answers

1. Below are the current assignment autotests.

Discuss what these print and why:

subset 0: quit
seq 42 44 | 2041 slippy 1q

ANSWER:

1 is the address

q is the command

The addess 1 is the address of the first line.

The command q is the command to quit.

So 1q will quit on the first line.

slippy will print the current line before it quits.

Giving us a single line of output: whatever the first line is.

In this case, the first line is 42 .

2041 slippy 10q < dictionary.txt

ANSWER:

...
...
the first 10 lines of the dictionary.txt file

10 is the address

q is the command

The addess 10 is the address of the 10th line.

The command q is the command to quit.

So 10q will quit on the 10th line.

slippy will print the current line before it quits.

Giving us 10 lines of output: the first 10 lines.

seq 41 43 | 2041 slippy 4q

ANSWER:

41
42
43

4 is the address

q is the command

The addess 4 is the address of the 4th line.

The command q is the command to quit.

So 4q will quit on the 4th line.

But as there are only 3 lines of input, slippy will hit EOF first.
Therefore, all three lines of output will be printed.

The q command never gets the chance to be used.

seq 90 110 | 2041 slippy /.1/q

ANSWER:
90
91

/.1/ is the address

q is the command

The addess /.1/ is the address of any line that matches the regex .1 .
The command q is the command to quit.

The line 91 matches the regex .1 .

As the regex . (any character) matches the 9 .

So we will quit on the line 91 .

As allways print the current line before quitting.

2041 slippy '/r.*v/q' < dictionary.txt

ANSWER:

...
...
aardvark

/r.*v/ is the address

q is the command

The addess /r.*v/ is the address of any line that matches the regex r.*v .

The command q is the command to quit.

Depending on the contents of the dictionary.txt file, the lines printed may be different.

For my dictionary 247 line were printed before aardvark matches the regex r.*v .

All lines upto and including aardvark will be printed.

yes | 2041 slippy 3q

ANSWER:

y
y
y

Note: the yes command will print y infinitely.

3 is the address

q is the command

The addess 3 is the address of the 3rd line.

The command q is the command to quit.

Because yes prints infinitely slippy can't wait untill EOF to stop.

slippy also can't read all input lines into an array.

slippy must process lines one at a time.

Note: because of the $ address (last line) slippy needs to read two lines at a time.
The current lines and the next line (to detect when there is no next line).
slippy should not store more that two lines in memory at any time.

subset 0: print
seq 41 43 | 2041 slippy 2p

ANSWER:

41
42
42
43

Note: the yes command will print y infinitely.

2 is the address

p is the command
The addess 2 is the address of the 2nd line.
The command p is the print command.

So 2p will print on the second line.

This print is in addition to the automatic print of the current line that slippy already does.
This causes the second line to be printed twice.

head dictionary.txt | 2041 slippy 3p

ANSWER:

Third line of the dictionary.txt file is printed twice.

seq 41 43 | 2041 slippy -n 2p

ANSWER:

The -n option is used suppress (turn off) the automatic printing of the current line.

Therefore we only print when explicitly asked to.

We are asked to print the second line, and so get the output of 42 .

2041 slippy -n 42p < dictionary.txt

ANSWER:

Similar to the previous example

Only the 42nd line is printed.

head -n 1000 dictionary.txt | 2041 slippy -n '/z.$/p'

ANSWER:

Similar to the previous example

Only print a line if it matches the regex z.$ .

That is: if the second last character is a z .

subset 0: substitute
seq 1 5 | 2041 slippy 's/[15]/zzz/'

ANSWER:

Run the substitute command on each line.

replace the first instance of 1 or 5 on each line with zzz .

seq 1 5 | 2041 slippy 's/[15]/zzz/g'

ANSWER:

Run the substitute command on each line.

replace all instances of 1 or 5 with zzz .

echo "Hello Andrew" | 2041 slippy 's/e//'

ANSWER:

Run the substitute command on each line.

replace the first instance of e on each line with the empty string.
echo "Hello Andrew" | 2041 slippy 's/e//g'

ANSWER:

Run the substitute command on each line.

replace all instances of e with the empty string.

subset 1: addresses
seq 1 5 | 2041 slippy '$d'

ANSWER:

$ is the special address for the last line.

d is the delete command.

If a line is deleted then processing immediately moves on to the next line.

The line is not automatically printed.

seq 42 44 | 2041 slippy 2,3d

ANSWER:

2,3 is a range address.

The command is applied to all lines within the range (start and end line inclusive).

seq 10 21 | 2041 slippy 3,/2/d

ANSWER:

Similar to the previous example

seq 10 21 | 2041 slippy /2/,7d

ANSWER:

Similar to the previous example

seq 10 21 | 2041 slippy /2/,/7/d

ANSWER:

Similar to the previous example

subset 1: substitute
seq 1 5 | 2041 slippy 'sX[15]XzzzX'

subset 1: multiple commands

seq 1 5 | 2041 slippy '4q;/2/d'

subset 1: -f
echo "4q" > commands.script
echo "/2/d" >> commands.script
seq 1 5 | 2041 slippy -f commands.script

subset 1: input files

seq 1 2 > two.txt
seq 1 5 > five.txt
2041 slippy '4q;/2/d' two.txt five.txt

subset 1: whitespace
seq 24 42 | 2041 slippy ' 3, 17 d # comment'

subset 2: -i
seq 1 5 > five.txt
2041 slippy -i /[24]/d five.txt
cat five.txt

subset 2: multiple commands

echo 'Punctuation characters include . , ; :' | 2041 slippy 's/;/semicolon/g;/;/q'

2. Write a Python program, tags.py which given the URL of a web page fetches it by running wget and prints the HTML
tags it uses.
The tag should be converted to lower case and printed in alphabetical order with a count of how often each is used.

Don't count closing tags.

Note the counts in the above example will not be current - the CSE pages change almost daily.

ANSWER:
#! /usr/bin/env python3

# written by Nasser Malibari and Dylan Brotherston

# fetch specified web page and count the HTML tags in them

import sys, re, subprocess

from collections import Counter

def main():

if len(sys.argv) != 2:
print(f"Usage: {sys.argv[0]} <url>", file=sys.stderr)
sys.exit(1)

url = sys.argv[1]

process = subprocess.run(["wget", "-q", "-O-", url], capture_output=True, text=True)

webpage = process.stdout.lower()

# remove comments
webpage = re.sub(r"", "", webpage, flags=re.DOTALL)

# get all tags

# note: use of capturing in re.findall returns list of the captured part
tags = re.findall(r"<\s*(\w+)", webpage)

# using collections.counter, alternatively can use a dict to count

tags_counter = Counter()
for tag in tags:
tags_counter[tag] += 1

for tag, counter in sorted(tags_counter.items()):

print(f"{tag} {counter}")

if __name__ == "__main__":
main()

3. Add an -f option to tags.py which indicates the tags are to be printed in order of frequency.

$ ./tags.py -f https://www.cse.unsw.edu.au
head 1
noscript 1
html 1
form 1
title 1
footer 1
header 1
body 1
h2 2
hr 3
h4 3
span 3
link 3
small 3
h5 3
em 3
meta 4
strong 4
input 5
img 12
br 14
script 14
p 18
ul 25
li 99
a 141
div 161

ANSWER:
#! /usr/bin/env python3

# written by Nasser Malibari and Dylan Brotherston

# fetch specified web page and count the HTML tags in them

import re, subprocess

from collections import Counter
from argparse import ArgumentParser

def main():

parser = ArgumentParser()
parser.add_argument('-f', '--frequency', action='store_true', help='print tags by
frequency')
parser.add_argument("url", help="url to fetch")
args = parser.parse_args()

process = subprocess.run(["wget", "-q", "-O-", args.url], capture_output=True, text=True)

webpage = process.stdout.lower()

# remove comments
webpage = re.sub(r"", "", webpage, flags=re.DOTALL)

# get all tags

# note: use of capturing in re.findall returns list of the captured part
tags = re.findall(r"<\s*(\w+)", webpage)

# using collections.counter, alternatively can use a dict to count

tags_counter = Counter()
for tag in tags:
tags_counter[tag] += 1

if args.frequency:
for tag, counter in reversed(tags_counter.most_common()):
print(f"{tag} {counter}")
else:
for tag, counter in sorted(tags_counter.items()):
print(f"{tag} {counter}")

if __name__ == "__main__":
main()

4. Modify tags.py to use the requests and beautifulsoup4 modules.

ANSWER:
#! /usr/bin/env python3

# written by Dylan Brotherston

# fetch specified web page and count the HTML tags in them

from collections import Counter

from argparse import ArgumentParser

import requests
from bs4 import BeautifulSoup

def main():

parser = ArgumentParser()
parser.add_argument('-f', '--frequency', action='store_true', help='print tags by
frequency')
parser.add_argument("url", help="url to fetch")
args = parser.parse_args()

response = requests.get(args.url)
webpage = response.text.lower()

soup = BeautifulSoup(webpage, 'html5lib')

tags = soup.find_all()
names = [tag.name for tag in tags]

tags_counter = Counter()
for tag in names:
tags_counter[tag] += 1

if args.frequency:
for tag, counter in reversed(tags_counter.most_common()):
print(f"{tag} {counter}")
else:
for tag, counter in sorted(tags_counter.items()):
print(f"{tag} {counter}")

if __name__ == "__main__":
main()

5. If you fell like a harder challenge after finishing the challenge activity in the lab this week have a look at the following
websites for some problems to solve using regexp:

◦ https://regex101.com/quiz
◦ https://alf.nu/RegexGolf

CIT215 SUMMARY
No ratings yet
CIT215 SUMMARY
42 pages
Python3 Notes
No ratings yet
Python3 Notes
215 pages
Awk_one-liners
No ratings yet
Awk_one-liners
58 pages
Er Series Rcs2 Cmdsys
No ratings yet
Er Series Rcs2 Cmdsys
133 pages
S4HANA Architecture Guideline v1805
No ratings yet
S4HANA Architecture Guideline v1805
100 pages
Characteristics of Responsible Users and Competent Producers of Media and Information
No ratings yet
Characteristics of Responsible Users and Competent Producers of Media and Information
83 pages
Linux Stream Editor
No ratings yet
Linux Stream Editor
85 pages
Module 5
No ratings yet
Module 5
14 pages
Lectures OOP Python
No ratings yet
Lectures OOP Python
22 pages
Lab 3
No ratings yet
Lab 3
19 pages
Perl Training Session1 22nd Sept 2012
No ratings yet
Perl Training Session1 22nd Sept 2012
125 pages
First Web Scraper
No ratings yet
First Web Scraper
34 pages
Linux-commands-May24-4
No ratings yet
Linux-commands-May24-4
7 pages
Sed, A Stream Editor: by Ken Pizzini, Paolo Bonzini
No ratings yet
Sed, A Stream Editor: by Ken Pizzini, Paolo Bonzini
81 pages
Basic Unix Commands1
No ratings yet
Basic Unix Commands1
40 pages
Hobbes - Leviathan - 1839 PDF
No ratings yet
Hobbes - Leviathan - 1839 PDF
738 pages
Shell Commands Sept4Update
No ratings yet
Shell Commands Sept4Update
66 pages
Btech Linux Experiment
No ratings yet
Btech Linux Experiment
43 pages
Python Tutorial_ Execute a Script
No ratings yet
Python Tutorial_ Execute a Script
1 page
Week 4 Graph Data Structures
No ratings yet
Week 4 Graph Data Structures
46 pages
Final Study Notes
No ratings yet
Final Study Notes
36 pages
linux
No ratings yet
linux
5 pages
Week 5 Graph Algorithms
No ratings yet
Week 5 Graph Algorithms
42 pages
USING THE PYTHON INTERPRETER
No ratings yet
USING THE PYTHON INTERPRETER
7 pages
Lesson-5_Shell_Scripting_and_Django
No ratings yet
Lesson-5_Shell_Scripting_and_Django
64 pages
Sed One-Liners Explained (Preview Copy)
No ratings yet
Sed One-Liners Explained (Preview Copy)
17 pages
Week 2 Analysis of Algorithms
No ratings yet
Week 2 Analysis of Algorithms
36 pages
Unix Head and Tail Commands
No ratings yet
Unix Head and Tail Commands
8 pages
Ieee Xtreme 6.0 Faq Sample Questions PDF
No ratings yet
Ieee Xtreme 6.0 Faq Sample Questions PDF
63 pages
EE250Unit1_Technologies
No ratings yet
EE250Unit1_Technologies
61 pages
Ignou MCSL 45
No ratings yet
Ignou MCSL 45
20 pages
Week 10 Randomised Algorithms, Algorithm and Data Ethics, Course Review
No ratings yet
Week 10 Randomised Algorithms, Algorithm and Data Ethics, Course Review
21 pages
Computer Fundamentals & Programming: Using Python
No ratings yet
Computer Fundamentals & Programming: Using Python
124 pages
Python3 Notes
No ratings yet
Python3 Notes
432 pages
Template Configuration For MA5600
No ratings yet
Template Configuration For MA5600
12 pages
Week 8 Search Tree Algorithms
No ratings yet
Week 8 Search Tree Algorithms
20 pages
Supporting Documents for PDOS interaction - QR Code contents
No ratings yet
Supporting Documents for PDOS interaction - QR Code contents
7 pages
LinuxCommands Ipython
No ratings yet
LinuxCommands Ipython
2 pages
Week 10 Tutorial Sample Answers
No ratings yet
Week 10 Tutorial Sample Answers
9 pages
Linux Commands
No ratings yet
Linux Commands
33 pages
Unix Text Processing
No ratings yet
Unix Text Processing
11 pages
Week 7 Search Tree Data Structures
No ratings yet
Week 7 Search Tree Data Structures
19 pages
Week 9 String Algorithms, Approximation
No ratings yet
Week 9 String Algorithms, Approximation
22 pages
Ub Phy410 Compphys Linuxoverview 3
No ratings yet
Ub Phy410 Compphys Linuxoverview 3
47 pages
Comp2041 W1
No ratings yet
Comp2041 W1
3 pages
Week 1
No ratings yet
Week 1
16 pages
Sedbook
No ratings yet
Sedbook
16 pages
Laboratory No. 1 - PYTHON
No ratings yet
Laboratory No. 1 - PYTHON
7 pages
Biomedical and Instrumentation Lab File
No ratings yet
Biomedical and Instrumentation Lab File
37 pages
Introduction To The Unix Environment: Valeriu Ohan
No ratings yet
Introduction To The Unix Environment: Valeriu Ohan
13 pages
Week 07 Tutorial Sample Answers
No ratings yet
Week 07 Tutorial Sample Answers
11 pages
Android Fragments
No ratings yet
Android Fragments
14 pages
Week 05 Tutorial Sample Answers
No ratings yet
Week 05 Tutorial Sample Answers
11 pages
Huawei Optical Network Maintenance Reference-WDM ASON-20140826-C
100% (1)
Huawei Optical Network Maintenance Reference-WDM ASON-20140826-C
116 pages
Web Scraping
No ratings yet
Web Scraping
35 pages
Timer0 Code ATmega328p
No ratings yet
Timer0 Code ATmega328p
3 pages
Lab1 2024
No ratings yet
Lab1 2024
5 pages
Deloitte Pyspark Interview Questions for Data Engineer 2024 _ by Ronit Malhotra _ Jun, 2024 _ Medium
No ratings yet
Deloitte Pyspark Interview Questions for Data Engineer 2024 _ by Ronit Malhotra _ Jun, 2024 _ Medium
9 pages
Notes 2 Working On A Terminal 11aug2022
No ratings yet
Notes 2 Working On A Terminal 11aug2022
10 pages
Import This: Python 2.7.6 (Default, Oct 26 2016, 20:30:19) (GCC 4.8.4) On Linux2 01
No ratings yet
Import This: Python 2.7.6 (Default, Oct 26 2016, 20:30:19) (GCC 4.8.4) On Linux2 01
3 pages
Advanced Digital Control DMX-1X
No ratings yet
Advanced Digital Control DMX-1X
4 pages
LOGO with date n header
No ratings yet
LOGO with date n header
3 pages
BINARY LOGGING CHAP 6 (Unfinished)
No ratings yet
BINARY LOGGING CHAP 6 (Unfinished)
5 pages
Python
100% (9)
Python
431 pages
CS35L Compiled Notes
No ratings yet
CS35L Compiled Notes
6 pages
Bash Ch01
No ratings yet
Bash Ch01
14 pages
History of Computer
No ratings yet
History of Computer
12 pages
Week 08 Tutorial Sample Answers
No ratings yet
Week 08 Tutorial Sample Answers
4 pages
Linux Intro PDF
No ratings yet
Linux Intro PDF
6 pages
Web Service - Android
100% (1)
Web Service - Android
13 pages
Bash Cheatsheets GitHub
No ratings yet
Bash Cheatsheets GitHub
8 pages
Lamiaa Habib (Mobile Developer)
No ratings yet
Lamiaa Habib (Mobile Developer)
3 pages
Manual Mill
0% (1)
Manual Mill
49 pages
ASP MVC Syllabus
No ratings yet
ASP MVC Syllabus
2 pages
Petrel 2014 - Well Headers in A Well Section
No ratings yet
Petrel 2014 - Well Headers in A Well Section
4 pages
TCS Innovation Case Study Tata Nano 130510
No ratings yet
TCS Innovation Case Study Tata Nano 130510
4 pages
Designing Effective Presentations: June 2007 Fred Zinn Zinn@oit - Umass.edu
No ratings yet
Designing Effective Presentations: June 2007 Fred Zinn Zinn@oit - Umass.edu
9 pages
Harga Pulsa, Token Dan Paket
No ratings yet
Harga Pulsa, Token Dan Paket
3 pages
Help Desk Technical Design Document
No ratings yet
Help Desk Technical Design Document
13 pages
Format 8 Synchronizing Report
No ratings yet
Format 8 Synchronizing Report
4 pages
Python Programming
No ratings yet
Python Programming
89 pages
Embedded Systems 2marks
No ratings yet
Embedded Systems 2marks
28 pages
Linux Intro
No ratings yet
Linux Intro
6 pages
Some Useful UNIX Commands
100% (1)
Some Useful UNIX Commands
5 pages
Unix Commands
No ratings yet
Unix Commands
4 pages
Perl Refcard
No ratings yet
Perl Refcard
2 pages
Instructional Plan in TLE - 10 Major in Bookkeeping: Pardo National High School (2 Shift) Pardo, Cebu City
No ratings yet
Instructional Plan in TLE - 10 Major in Bookkeeping: Pardo National High School (2 Shift) Pardo, Cebu City
2 pages
Linux Intro
No ratings yet
Linux Intro
6 pages
Unix Commands
No ratings yet
Unix Commands
2 pages
C Programming
From Everand
C Programming
Netra
No ratings yet
Profound Linux For Developers
From Everand
Profound Linux For Developers
Onder Teker
No ratings yet
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Lisp Programming Language
From Everand
Lisp Programming Language
Faiz ul haque Zeya
No ratings yet
Learn Programming Using C#
From Everand
Learn Programming Using C#
Taurius Litvinavicius
No ratings yet
Bash Command Line Pro Tips
From Everand
Bash Command Line Pro Tips
Jason Cannon
4.5/5 (8)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Week 09 Tutorial Sample Answers

Uploaded by

Week 09 Tutorial Sample Answers

Uploaded by

Week 09 Tutorial Sample Answers

1. Below are the current assignment autotests.

The addess 1 is the address of the first line.

So 1q will quit on the first line.

slippy will print the current line before it quits.

Giving us a single line of output: whatever the first line is.

In this case, the first line is 42 .

2041 slippy 10q < dictionary.txt

The addess 10 is the address of the 10th line.

The command q is the command to quit.

So 10q will quit on the 10th line.

Giving us 10 lines of output: the first 10 lines.

seq 41 43 | 2041 slippy 4q

The addess 4 is the address of the 4th line.

So 4q will quit on the 4th line.

The q command never gets the chance to be used.

seq 90 110 | 2041 slippy /.1/q

/.1/ is the address

The line 91 matches the regex .1 .

As the regex . (any character) matches the 9 .

As allways print the current line before quitting.

2041 slippy '/r.*v/q' < dictionary.txt

/r.*v/ is the address

The command q is the command to quit.

All lines upto and including aardvark will be printed.

yes | 2041 slippy 3q

Note: the yes command will print y infinitely.

The addess 3 is the address of the 3rd line.

The command q is the command to quit.

slippy also can't read all input lines into an array.

slippy must process lines one at a time.

Note: the yes command will print y infinitely.

So 2p will print on the second line.

head dictionary.txt | 2041 slippy 3p

Third line of the dictionary.txt file is printed twice.

seq 41 43 | 2041 slippy -n 2p

Therefore we only print when explicitly asked to.

2041 slippy -n 42p < dictionary.txt

Similar to the previous example

Only the 42nd line is printed.

head -n 1000 dictionary.txt | 2041 slippy -n '/z.$/p'

Similar to the previous example

Only print a line if it matches the regex z.$ .

Run the substitute command on each line.

seq 1 5 | 2041 slippy 's/[15]/zzz/g'

Run the substitute command on each line.

replace all instances of 1 or 5 with zzz .

echo "Hello Andrew" | 2041 slippy 's/e//'

Run the substitute command on each line.

Run the substitute command on each line.

$ is the special address for the last line.

d is the delete command.

If a line is deleted then processing immediately moves on to the next line.

The line is not automatically printed.

seq 42 44 | 2041 slippy 2,3d

2,3 is a range address.

seq 10 21 | 2041 slippy 3,/2/d

Similar to the previous example

seq 10 21 | 2041 slippy /2/,7d

Similar to the previous example

seq 10 21 | 2041 slippy /2/,/7/d

Similar to the previous example

subset 1: multiple commands

subset 1: input files

subset 2: multiple commands

Don't count closing tags.

Make sure you don't print tags within HTML comments.

# written by Nasser Malibari and Dylan Brotherston

import sys, re, subprocess

process = subprocess.run(["wget", "-q", "-O-", url], capture_output=True, text=True)

# get all tags

# using collections.counter, alternatively can use a dict to count

for tag, counter in sorted(tags_counter.items()):

# written by Nasser Malibari and Dylan Brotherston

import re, subprocess

process = subprocess.run(["wget", "-q", "-O-", args.url], capture_output=True, text=True)