Week 09 Tutorial Sample Answers
Week 09 Tutorial Sample Answers
subset 0: quit
seq 42 44 | 2041 slippy 1q
ANSWER:
42
1 is the address
q is the command
ANSWER:
...
...
the first 10 lines of the dictionary.txt file
10 is the address
q is the command
ANSWER:
41
42
43
4 is the address
q is the command
But as there are only 3 lines of input, slippy will hit EOF first.
Therefore, all three lines of output will be printed.
ANSWER:
90
91
q is the command
The addess /.1/ is the address of any line that matches the regex .1 .
The command q is the command to quit.
ANSWER:
...
...
aardvark
q is the command
The addess /r.*v/ is the address of any line that matches the regex r.*v .
Depending on the contents of the dictionary.txt file, the lines printed may be different.
For my dictionary 247 line were printed before aardvark matches the regex r.*v .
ANSWER:
y
y
y
3 is the address
q is the command
Note: because of the $ address (last line) slippy needs to read two lines at a time.
The current lines and the next line (to detect when there is no next line).
slippy should not store more that two lines in memory at any time.
subset 0: print
seq 41 43 | 2041 slippy 2p
ANSWER:
41
42
42
43
p is the command
The addess 2 is the address of the 2nd line.
The command p is the print command.
ANSWER:
ANSWER:
42
The -n option is used suppress (turn off) the automatic printing of the current line.
We are asked to print the second line, and so get the output of 42 .
ANSWER:
ANSWER:
subset 0: substitute
seq 1 5 | 2041 slippy 's/[15]/zzz/'
ANSWER:
ANSWER:
ANSWER:
replace the first instance of e on each line with the empty string.
echo "Hello Andrew" | 2041 slippy 's/e//g'
ANSWER:
subset 1: addresses
seq 1 5 | 2041 slippy '$d'
ANSWER:
ANSWER:
The command is applied to all lines within the range (start and end line inclusive).
ANSWER:
ANSWER:
ANSWER:
subset 1: substitute
seq 1 5 | 2041 slippy 'sX[15]XzzzX'
subset 1: -f
echo "4q" > commands.script
echo "/2/d" >> commands.script
seq 1 5 | 2041 slippy -f commands.script
subset 1: whitespace
seq 24 42 | 2041 slippy ' 3, 17 d # comment'
subset 2: -i
seq 1 5 > five.txt
2041 slippy -i /[24]/d five.txt
cat five.txt
2. Write a Python program, tags.py which given the URL of a web page fetches it by running wget and prints the HTML
tags it uses.
The tag should be converted to lower case and printed in alphabetical order with a count of how often each is used.
$ ./tags.py https://www.cse.unsw.edu.au
a 141
body 1
br 14
div 161
em 3
footer 1
form 1
h2 2
h4 3
h5 3
head 1
header 1
hr 3
html 1
img 12
input 5
li 99
link 3
meta 4
noscript 1
p 18
script 14
small 3
span 3
strong 4
title 1
ul 25
Note the counts in the above example will not be current - the CSE pages change almost daily.
ANSWER:
#! /usr/bin/env python3
def main():
if len(sys.argv) != 2:
print(f"Usage: {sys.argv[0]} <url>", file=sys.stderr)
sys.exit(1)
url = sys.argv[1]
# remove comments
webpage = re.sub(r"<!--.*?-->", "", webpage, flags=re.DOTALL)
if __name__ == "__main__":
main()
3. Add an -f option to tags.py which indicates the tags are to be printed in order of frequency.
$ ./tags.py -f https://www.cse.unsw.edu.au
head 1
noscript 1
html 1
form 1
title 1
footer 1
header 1
body 1
h2 2
hr 3
h4 3
span 3
link 3
small 3
h5 3
em 3
meta 4
strong 4
input 5
img 12
br 14
script 14
p 18
ul 25
li 99
a 141
div 161
ANSWER:
#! /usr/bin/env python3
def main():
parser = ArgumentParser()
parser.add_argument('-f', '--frequency', action='store_true', help='print tags by
frequency')
parser.add_argument("url", help="url to fetch")
args = parser.parse_args()
# remove comments
webpage = re.sub(r"<!--.*?-->", "", webpage, flags=re.DOTALL)
if args.frequency:
for tag, counter in reversed(tags_counter.most_common()):
print(f"{tag} {counter}")
else:
for tag, counter in sorted(tags_counter.items()):
print(f"{tag} {counter}")
if __name__ == "__main__":
main()
ANSWER:
#! /usr/bin/env python3
import requests
from bs4 import BeautifulSoup
def main():
parser = ArgumentParser()
parser.add_argument('-f', '--frequency', action='store_true', help='print tags by
frequency')
parser.add_argument("url", help="url to fetch")
args = parser.parse_args()
response = requests.get(args.url)
webpage = response.text.lower()
tags = soup.find_all()
names = [tag.name for tag in tags]
tags_counter = Counter()
for tag in names:
tags_counter[tag] += 1
if args.frequency:
for tag, counter in reversed(tags_counter.most_common()):
print(f"{tag} {counter}")
else:
for tag, counter in sorted(tags_counter.items()):
print(f"{tag} {counter}")
if __name__ == "__main__":
main()
5. If you fell like a harder challenge after finishing the challenge activity in the lab this week have a look at the following
websites for some problems to solve using regexp:
◦ https://regex101.com/quiz
◦ https://alf.nu/RegexGolf