Week1 CLOUD
Week1 CLOUD
scientists
William Trimble
2022 BioDS workshop
Who has had the “grumpy sysadmin”
problem?
• Student: Hey, look there’s this piece of software
that claims to process exactly the kind of data I
need to process.
Who has had the “grumpy sysadmin”
problem?
• Sysadmin: You’re not installing those unholy
packages from the University of We-Don’t-
Sanitize-Our-Inputs on my server.
Who has had the “grumpy sysadmin”
problem?
• Sysadmin: You’re not installing those unholy
packages from the University of We-Don’t-
Sanitize-Our-Inputs on my server.
docker
run
--rm cleanup when done
-v $(PWD):/work allow access to current directory
-it ubuntu run the ubuntu image interactively
Converting id_biods.pem to id_biods.ppk in a
docker container
Once inside my docker container, I need to install the linux package
putty.
apt-get update
apt-get install –y putty
cd /work
ls
puttygen id_biods -o id_biods.ppk
ls
What is the catch?
• Consumes a lot of hard drive space, potentially heavy on network I/O
if you are building images from scratch all the time.
• Containers are weird. It was hard enough learning how to use a
command-line tool, now you are saying I have to wrap it into an
eggroll with a copy of an entire linux operating system?
• Need servers to host docker machine images.
Step-by-step docker for bioinformatics
Melbourne Bioinformatics has an excellent step-by-step docker tutorial,
• https://www.melbournebioinformatics.org.au/tutorials/tutorials/doc
ker/docker/
Docker engine is a little paranoid
NOTE: because image arguments is a variable-length field, it must follow the image name at the end of the line.
Use cases
• Web server in a container (mediawiki, blogging platform..)
• Compute environment (jupyter server)
• Database in a container (empty or pre-loaded)
• Utility in a container (single-purpose docker container)
• portability advantage
Bad stuff?
• Since the images are like tarballs, you don’t necessarily know what is in
them (and no one is inclined to inspect linux hard drives for safety!)
• If you could see the commands used to build the image, or could build
the image yourself you would download &install 500Mb from the linux
distribution sources instead of a 500Mb image from dockerhub.
• This is the procedure of building an image.
• It is safer to download a Dockerfile and build an image than to run an
image…
• (But we all install things from github without vetting them.) Whether
you would be forgiven for running something from dockerhub without
vetting it depends on the severity of the damage done.
Expiration date
• Dockerfiles need maintenance every 2-3 years.
• The APIs of some of the dependencies may have shifted, the
repositories may have changed the names of the packages you need,
and the OS version will go out of support so you need to dust off your
dockerfile so it still works in 2024.
Inside the docker container, you get to be king
• Docker by default logs you in as ‘root’ of this miniature linux machine
• AWS by default has a non-root account (ubuntu, ec2-user..)
• Some commands require sudo prefix
• Potential damage is limited to the explicit permissions
What resources do I need?
• Memory
• Storage space
• Network connection
• Computing is not free, and these resources pay nonzero amounts of rent
when it exceeds a certain scale.
• Code (and tutorials) are usually small, cheap, can be hosted for free.
• Training data is either privacy-encumbered or easy to download again.
• Final-results type data.. if is valuable enough to retain indefinitely, you
need to pay rent to have it preserved.
Computing has been commodified
Amazon EC2
What is cloud computing?
• https://aws.amazon.com/ec2/pricing/
https://aws.amazon.com/ec2/pricing/on-demand/
Yeah, but what does it mean for me, the
scientist?
• When you run out of some essential resource you need for analysis,
you ask your sponsor for funding to rent computers to solve your
problem.
• You have to learn to steward ephemeral compute resources.
• Renting computers means paying by the hour in exchange for not
owning the depreciation and maintenance.
• Tending cloud servers has a different feedback cycle / pace from other
tasks. (Billing by the minute/by the hour will do that to you)
Ephemeral resources?
• Do you have a customized environment? Prompt, ssh keys,
shortcuts?
• The cloud computing paradigm insists on
• A separation between application environment & configuration and data
• Generic computing environments
• That are created and destroyed
So.. let’s go.
• If you want AWS nodes yourself, you must start by giving Jeff Bezos
your credit card number.
• Then you must generate SSH keys. When AWS creates instances, it
installs your public keys on all your nodes. This is how you are going
to control your (linux) nodes.
• The first year you sign up, AWS will give you some free compute.
“free tier eligible” (warning: these servers have so little resources
that somethings things don’t work)
• So let me place an order for some servers you can use this afternoon.
Setting up a jupyter notebook server on EC2
• There are some breadcrumbs at https://dataschool.com/data-
modeling-101/running-jupyter-notebook-on-an-ec2-server/ and at
https://docs.aws.amazon.com/dlami/latest/devguide/setup-
jupyter.html
• I’ll create EC2 nodes with ports 22, 443, and 8888 open
You will need the private key
• -----BEGIN OPENSSH PRIVATE KEY-----
• b3BlbnNzaC1rZXktdjEAAAAACmFlczI1Ni1jdHIAAAAGYmNyeXB0AAAAGAAAABAt7M+E4o
• BTU1x4fRkNB7UmAAAAEAAAAAEAAAGXAAAAB3NzaC1yc2EAAAADAQABAAABgQDmcsE+9lAA
• jj3hFALiAQX7Ae5Bx/4RMp2fNDwxucSWeHFgRpw+nD4RZItqqbrwBVBIyH5AwlK27W/AIG
• 0STvplFu/unHKXV+e0BJd7+jYZ0oblKjGKUXnaovMWEn1qRNan11t6GOaxZrw4cAfGxhF5
• u+tP6PL8Ou7OKaGU1NhBat/RyOQGCSgkbr7D8D+Q1z4I8CcQ0n8SAzKPi4UwVGLR1f5XaC
• R3Kis+m6C0F4ViHG1wWQ6Muio1OWW4hVdCXq5ZC3DmFGmPmOqO5VSjxxN/L7RfmXLvpPGp
• Ph7mR8dpvUlamebpfac9ZQ5G8Kv8tO+88psCdNL4pAxnf858hq+wiUfdZ/ezr/tZAqCJir
• Tp3yvmv+y/tOhkAS99aL7H4IpnFGaeDNmQiyxM97FOYzCByxIUlMrsAkAhPGepC9kmytKP
• xWvfSCqxNpEZMm/4mBe7AZudUrOejzYkTl2DRp+S+eotKtKsuiaLNvVLd67rHsPZF9kuDv
• FoUdDcVC085K0AAAWQDLP98qaueFhLjnrutwxg1abLVHE1e4O/c8i1HVGv98Sizp1Akk0u
• hoOBclOC5ayi4YyxpZhghFVueme7fKQ+oZhxe3/h/4t2YifydD7ZkFRrBlrBjSqQORg94v
• ZqtB/6pPjVWWmT5YEAtMnFjvSoRu+vdz9oom4RRvgPohH+kXIpHVKuTAvPR3MjOj39ugt2
• gGzBy4M6BPh3PuU5R1LlNh7VKvFPknvK7YuRR7Jk7NFr5lCrtvckrTbFGi6mqwxrmddeFU
• FZSfKORRjXkqt52R0YFbXozweyTLmCnHBoHboYay3rz6YL0r+nLjAZbmRySxhwgGPWG2qJ
• otkXSoAR3Sv/dQ6OXHeSrvo3q6ek3Wn/frl3e7vroWAxBByosQvNYAfdbu70dE8LdXr2Tg
• 69b2BdGreb8vfczIoSGdZhQFRsF4yFHKg0DtiAQixIPlawK2DKQOysizCCYwL/FcQiRCvO
• uDsWtb8PNqjsbGCIdmQX6AetMY1SxhCiAHCpoLYYbqse2aeVfQMWHTSJrjDiLH68Wp0UkP
• XRyhiVK65rNJHQZJKsbZg5o0TjXTd7qcaqR6kRqeNtYKpfMsLL3PHOHK3zmDbEIS9iuJwD
• z1Ht1cHHssCpl2iKstOJIh1GNwSWWqxAmL6wVUlLn83lPG9+uT/yhoj9K/9eat1NytXpUN
• +efsc7hgL9CzQFR5MT1HpWI+WAU05TEr2LD++D6T1GfeoC28PoYQJtd7FnVJQrcvEGv41J
• vOWARNDidfuG+ShGZ679vqQc1W26LqQAPRghLTp7QXAXwV2tW80GRyRjIBVeFnn9hiB+Mh
• uUfCwRZ5bz5OCC3XKfEAtfDABz7FTOelXFHb7fHO+2TWKuH7F1+Ot+gLdVa7wFWyhA6Kyx
• 1z5oGNV6yijoGOFEIvPhZ9h5+E4ejEdXnDhdsjxqqcUbFZxVu4QpvdhOLr6zUAnvyx+cc2
• FDWtJK7mbmgyP8k4uBPOv7+On+XRPJY/DeyL/xMccSLw9Vw7xorV92zllB4ZX+DfRagwZK
• 8Jo5XIjuvUt29rHYQp+L2PjxPP1f1hL2GeOOmQQMddksYlkFqxbJDj5raq5J2QFoKuTOld
• Y60lKW9t4HyxG+EnrzyRlgsHw4AqlH4vKWRkzuwyODhPAYeboNltJegHA440LM0ZvUPgKl
Getting into our nodes: half the battle
ssh
-i
/Users/username/.ssh/id_biods path to local private key
ubuntu@ username on remote
18.205.151.213 remote hostname
Install jupyter on our (blank) node
sudo apt-get update
sudo apt-get install –y jupyter jupyter-core
Change a few bits to make jupyter behave:
cd ~
mkdir ssl
cd ssl
openssl req -x509 -nodes -days 365 -newkey
rsa:2048 -keyout mykey.key -out mycert.pem
Start server…
jupyter notebook --certfile=~/ssl/mycert.pem --
keyfile ~/ssl/mykey.key
navigate to
https://18.205.151.xxx:8888
Customize $HOME/.ssh/config
Host P2
Hostname 18.205.151.213
User ubuntu
Identityfile /Users/wltrimbl/.ssh/id_biods
# Replaces
ssh -i /Users/wltrimbl/.ssh/id_biods ubuntu@
18.205.151.213
# with
ssh P2