Useful Bash Commands
Useful Bash Commands
In the Introduction to Unix (which I'll borrow heavily from in this article) we
covered my subjective list of the 20 most important unix commands. In case you f
orgot, it was:
pwd
ls
cd
mkdir
echo
cat
cp
mv
rm
man
head
tail
less
more
sort
grep
which
chmod
history
clear
These are certainly indispensable but, for anything more than the basics, a voca
bulary of about 100 commands is where you want to be. Here are my picks for the
top 100(ish) commands. I'll briefly introduce each one below along with a quick
use case or hint.
A few of the commands, mostly clustered at the end of the article, are not stand
ard shell commands and you have to install them yourself (see apt-get, brew, yum
). When this is the case, I make a note of it.
pwd TOP VIEW_AS_PAGE
To see where we are, we can print working directory, which returns the path of t
he directory in which we currently reside:
$ pwd
Sometimes in a script we change directories, but we want to save the directory i
n which we started. We can save the current working directory in a variable with
command substitution:
d=`pwd`
# save cwd
cd /somewhere/else
# go somewhere else
# do something else
cd $d
# go back to where the script originally started
If you want to print your working directory resolving symbolic links:
$ readlink -m .
cd TOP VIEW_AS_PAGE
To move around, we can change directory:
$ cd /some/path
By convention, if we leave off the argument and just type cd we go HOME:
$ cd
# go $HOME
Go to the root directory:
$ cd / # go to root
Move one directory back:
$ cd .. # move one back toward root
To return to the directory we were last in:
$ cd - # cd into the previous directory
As mentioned in An Introduction to Unix, you can put cd into an if-statement:
rmdir. About the only occasion to use it is if you want to be careful you're not
deleting a directory with stuff in it. (Perhaps rmdir isn't one of the 100 most
useful commands, after all.)
echo TOP VIEW_AS_PAGE
echo prints the string passed to it as an argument. For example:
$ echo joe
joe
$ echo "joe"
joe
$ echo "joe joe"
joe joe
If you leave off an argument, echo will produce a newline:
$ echo
Supress newline:
$ echo -n "joe"
# suppress newline
Interpret special characters:
$ echo -e "joe\tjoe\njoe" # interpret special chars ( \t is tab, \n newline )
joe
joe
joe
You should also be aware of how bash treats double vs single quotes. We've seen
above that if you want to use a string with spaces, you use double quotes. If yo
u use double quotes, any variable inside them will be expanded, the same as in P
erl. If you use single quotes, everything is taken literally and variables are n
ot expanded. Here's an example:
$ var=5
$ joe=hello $var
-bash: 5: command not found
That didn't work because we forgot the quotes. Let's fix it:
$ joe="hello $var"
$ echo $joe
hello 5
$ joe='hello $var'
$ echo $joe
hello $var
Sometimes, when you run a script verbosely, you want to echo commands before you
execute them. A std:out log file of this type is invaluable if you want to retr
ace your steps later. One way of doing this is to save a command in a variable,
cmd, echo it, and then pipe it into bash or sh. Your script might look like this
:
cmd="ls -hl";
echo $cmd;
echo $cmd | bash
You can even save files this way:
$ cmd="echo joe | wc -c > testout.txt"
$ echo $cmd
echo joe | wc -c > testout.txt
$ echo $cmd | bash
$ cat testout.txt
4
If you're using SGE's qsub, you can use similar constructions:
$ cmd="echo joe | wc -c > testout.txt"
$ echo $cmd
echo joe | wc -c > testout.txt
$ echo $cmd | qsub -N myjob -cwd
Very cool! If you observe this parallelism:
$ echo $cmd | bash
$ echo $cmd | qsub
you can see how this would be useful in a script with a toggle SGE switch. You c
ould pass the shell command to a function and use an if-statement to choose if y
ou want to pipe the command to bash or qsub. One thing to keep in mind is that a
ny quotation marks ( " ) within your command must be escaped with a slash ( \ ).
qsub is not a standard shell command. For more information about it, see the Sun
Grid Engine Wiki post.
cat, zcat, tac TOP VIEW_AS_PAGE
cat prints the contents of files passed to it as arguments. For example:
$ cat file.txt
prints the contents of file.txt. Entering:
$ cat file.txt file2.txt
would print out the contents of both file.txt and file2.txt concatenated togethe
r, which is where this command gets its slightly confusing name. Print file with
line numbers:
$ cat -n file.txt
# print file with line numbers
cat is frequently seen in unix pipelines. For instance:
$ cat file.txt | wc -l
# count the number of lines in a file
$ cat file.txt | cut -f1
# cut the first column
Some people deride this as unnecessarily verbose, but I'm so used to piping anyt
hing and everything that I embrace it. Another common construction is:
cat file.txt | awk ...
We'll discuss awk below, but the key point about it is that it works line by lin
e. So awk will process what cat pipes out in a linewise fashion.
If you route something to cat via a pipe, it just passes through:
$ echo "hello kitty" | cat
hello kitty
The -vet flag allows us to "see" special characters, like tab, newline, and carr
iage return:
$ echo -e "\t" | cat -vet
^I$
$ echo -e "\n" | cat -vet
$
$
$ echo -e "\r" | cat -vet
^M$
This can come into play if you're looking at a file produced on a PC, which uses
the horrid \r at the end of a line as opposed to the nice unix newline, \n. You
can do a similar thing with the command od, as we'll see below .
There are two variations on cat, which occasionally come in handy. zcat allows y
ou to cat zipped files:
$ zcat file.txt.gz
You can also see a file in reverse order, bottom first, with tac (tac is cat spe
lled backwards).
cp TOP VIEW_AS_PAGE
The command to make a copy of a file is cp:
$ cp file1 file2
Use the recursive flag, -R, to copy directories plus files:
$ cp -R dir1 dir2 # copy directories
The directory and everything inside it are copied.
Question: what would the following do?
$ cp -R dir1 ../../
Answer: it would make a copy of dir1 up two levels from our current working dire
ctory.
Tip: If you're moving a large directory structure with lots of files in them, us
e rsync instead of cp. If the command fails midway through, rsync can start from
where it left off but cp can't.
mv TOP VIEW_AS_PAGE
To rename a file or directory we use mv:
$ mv file1 file2
In a sense, this command also moves files, because we can rename a file into a d
ifferent path. For example:
$ mv file1 dir1/dir2/file2
would move file1 into dir1/dir2/ and change its name to file2, while:
$ mv file1 dir1/dir2/
would simply move file1 into dir1/dir2/ (or, if you like, rename ./file1 as ./di
r1/dir2/file1).
Swap the names of two files, a and b:
$ mv a a.1
$ mv b a
$ mv a.1 b
Change the extension of a file, test.txt, from .txt to .html:
$ mv test.txt test.html
Shortcut:
$ mv test.{txt,html}
mv can be dangerous because, if you move a file into a directory where a file of
the same name exists, the latter will be overwritten. To prevent this, use the
-n flag:
$ mv -n myfile mydir/ # move, unless "myfile" exists in "mydir"
rm TOP VIEW_AS_PAGE
The command rm removes the files you pass to it as arguments:
$ rm file
# removes a file
Use the recursive flag, -r, to remove a file or a directory:
$ rm -r dir # removes a file or directory
If there's a permission issue or the file doesn't exist, rm will throw an error.
You can override this with the force flag, -f:
$ rm -rf dir # force removal of a file or directory
# (i.e., ignore warnings)
You may be aware that when you delete files, they can still be recovered with ef
fort if your computer hasn't overwritten their contents. To securely delete your
filesmeaning overwrite them before deletinguse:
$ rm -P file # overwrite your file then delete
or use shred.
shred TOP VIEW_AS_PAGE
Securely remove your file by overwritting then removing:
$ shred -zuv file
# removes a file securely
# (flags are: zero, remove, verbose)
For example:
$ touch crapfile # create a file
$ shred -zuv crapfile
shred: crapfile: pass 1/4 (random)...
shred: crapfile: pass 2/4 (random)...
shred: crapfile: pass 3/4 (random)...
shred: crapfile: pass 4/4 (000000)...
shred: crapfile: removing
shred: crapfile: renamed to 00000000
shred: 00000000: renamed to 0000000
shred: 0000000: renamed to 000000
shred: 000000: renamed to 00000
shred: 00000: renamed to 0000
kitty
kitty
which is useful if you're previewing many files. To preview all files in the cur
rent directory:
$ head *
See the last 10 lines of your bash history:
$ history | tail
# show the last 10 lines of history
See the first 10 elements in the cwd:
$ ls | head
less, zless, more TOP VIEW_AS_PAGE
Based off of An Introduction to Unix - less and more: less is, as the man pages
say, "a filter for paging through text one screenful at a time...which allows ba
ckward movement in the file as well as forward movement." This makes it one of t
he odd unix commands whose name seems to bear no relation to its function. If yo
u have a big file, vanilla cat is not suitable because printing thousands of lin
es of text to stdout will flood your screen. Instead, use less, the go-to comman
d for viewing files in the terminal:
$ less myfile.txt
# view the file page by page
Another nice thing about less is that it has many Vim-like features, which you c
an read about on its man page (and this is not a coincidence). For example, if y
ou want to search for the word apple in your file, you just type slash ( / ) fol
lowed by apple.
If you have a file with many columns, it's hard to view in the terminal. A neat
less flag to solve this problem is -S:
$ less -S myfile.txt
# allow horizontal scrolling
This enables horizontal scrolling instead of having rows messily wrap onto the n
ext line. As we'll discuss below, this flag works particularly well in combinati
on with the column command, which forces the columns of your file to line up nic
ely:
cat myfile.txt | column -t | less -S
Use zless to less a zipped file:
$ zless myfile.txt.gz
# view a zipped file
I included less and more together because I think about them as a pair, but I to
ld a little white lie in calling more indispensible: you really only need less,
which is an improved version of more. Less is more :-)
grep, egrep TOP VIEW_AS_PAGE
Based off of An Introduction to Unix - grep: grep is the terminal's analog of fi
nd from ordinary computing (not to be confused with unix find). If you've ever u
sed Safari or TextEdit or Microsoft Word, you can find a word with F (Command-f)
on Macintosh. Similarly, grep searches for text in a file and returns the line(s
) where it finds a match. For example, if you were searching for the word apple
in your file, you'd do:
$ grep apple myfile.txt
# return lines of file with the text apple
grep has many nice flags, such as:
$ grep -n apple myfile.txt
# include the line number
$ grep -i apple myfile.txt
# case insensitive matching
$ grep --color apple myfile.txt # color the matching text
Also useful are what I call the ABCs of Grepthat's After, Before, Context. Here's
what they do:
$ grep -A1 apple myfile.txt
# return lines with the match,
# as well as 1 after
$ grep -B2 apple myfile.txt
# return lines with the match,
# as well as 2 before
$ grep -C3 apple myfile.txt
# return lines with the match,
# as well as 3 before and after.
You can do an inverse grep with the -v flag. Find lines that don't contain apple
:
If you want to run a command as the superuser, you can use sudo. For example, yo
u'll have trouble if you try to make a directory called junk in the root directo
ry:
$ mkdir /junk
mkdir: cannot create directory /junk: Permission denied
However, if you invoke this command as the root user, you can do it:
$ sudo mkdir /junk
provided you type in the password. Because rootnot your userowns this file, you al
so need sudo to remove this directory:
$ sudo rmdir /junk
If you want to experience life as the root user, try:
$ sudo -i
Here s what this looks like on my computer:
$ whoami
# check my user name
oliver
$ sudo -i
# simulate login as root
Password:
$ whoami
# check user name now
root
$ exit
# logout of root account
logout
$ whoami
# check user name again
oliver
Obviously, you should be cautious with sudo. When might using it be appropriate?
The most common use case is when you have to install a program, which might wan
t to write into a directory root owns, like /usr/bin, or access system files. I
discuss installing new software below. You can also do terrible things with sudo
, such as gut your whole computer with a command so unspeakable I cannot utter i
t in syntactically viable form. That s sudo are em dash are eff forward slashit l
ives in infamy in an Urban Dictionary entry.
You can grant root permissions to various users by tinkering with the configurat
ion file:
/etc/sudoers
which says: "Sudoers allows particular users to run various commands as the root
user, without needing the root password." Needless to say, only do this if you
know what you re doing.
su TOP VIEW_AS_PAGE
su switches the user you are in the terminal. For example, to change to the user
jon:
$ su jon
where you have to know jon s password.
wc TOP VIEW_AS_PAGE
wc counts the number of words, lines, or characters in a file.
Count lines:
$ cat myfile.txt
aaa
bbb
ccc
ddd
$ cat myfile.txt | wc -l
4
Count words:
$ echo -n joe | wc -w
1
Count characters:
$ echo -n joe | wc -c
3
sort TOP VIEW_AS_PAGE
From An Introduction to Unix - sort: As you guessed, the command sort sorts file
s. It has a large man page, but we can learn its basic features by example. Let
s suppose we have a file, testsort.txt, such that:
$ cat testsort.txt
vfw
34
awfjo
a
4
2
f
10
10
beb
43
c
f
2
33
f
1
?
Then:
$ sort testsort.txt
a
4
2
beb
43
c
f
1
?
f
10
10
f
2
33
vfw
34
awfjo
What happened? The default behavior of sort is to dictionary sort the rows of a
file according to what s in the first column, then second column, and so on. Whe
re the first column has the same valuef in this examplethe values of the second co
lumn determine the order of the rows. Dictionary sort means that things are sort
ed as they would be in a dictionary: 1,2,10 gets sorted as 1,10,2. If you want t
o do a numerical sort, use the -n flag; if you want to sort in reverse order, us
e the -r flag. You can also sort according to a specific column. The notation fo
r this is:
sort -kn,m
where n and m are numbers which refer to the range column n to column m. In prac
tice, it may be easier to use a single column rather than a range so, for exampl
e:
sort -k2,2
means sort by the second column (technically from column 2 to column 2).
To sort numerically by the second column:
$ sort -k2,2n testsort.txt
f
1
?
f
2
33
a
4
2
f
10
10
vfw
34
awfjo
beb
43
c
As is often the case in unix, we can combine flags as much as we like.
Question: what does this do?
$ sort -k1,1r -k2,2n testsort.txt
vfw
34
awfjo
f
1
?
f
2
33
f
10
10
beb
43
c
a
4
2
Answer: the file has been sorted first by the first column, in reverse dictionar
y order, and thenwhere the first column is the sameby the second column in numeric
al order. You get the point!
Sort uniquely:
$ sort -u testsort.txt
# sort uniquely
The file:
~/.ssh/config
determines ssh s behavior and you can create it if it doesn t exist. In the next
section, you ll see how to modify this config file to use an IdentityFile, so t
hat you re spared the annoyance of typing in your password every time you ssh (s
ee also Nerderati: Simplify Your Life With an SSH Config File).
If you re frequently getting booted from a remote server after periods of inacti
vity, trying putting something like this into your config file:
ServerAliveInterval = 300
If this is your first encounter with ssh, you d be surprised how much of the wor
k of the world is done by ssh. It s worth reading the extensive man page, which
gets into matters of computer security and cryptography.
ssh-keygen TOP VIEW_AS_PAGE
On your own private computer, you can ssh into a particular server without havin
g to type in your password. To set this up, first generate rsa ssh keys:
$ mkdir -p ~/.ssh && cd ~/.ssh
$ ssh-keygen -t rsa -f localkey
This will create two files on your computer, a public key:
~/.ssh/localkey.pub
and a private key:
~/.ssh/localkey
You can share your public key, but do not give anyone your private key! Suppose
you want to ssh into myserver.com. Normally, that s:
$ ssh [email protected]
Add these lines to your ~/.ssh/config file:
Host Myserver
HostName myserver.com
User myusername
IdentityFile ~/.ssh/localkey
Now cat your public key and paste it into:
~/.ssh/authorized_keys
on the remote machine (i.e., on myserver.com). Now on your local computer, you c
an ssh without a password:
$ ssh Myserver
You can also use this technique to push to github.com, without having to punch y
our password in each time, as described here.
scp TOP VIEW_AS_PAGE
If you have ssh access to a remote computer and want to copy its files to your l
ocal computer, you can use scp according to the syntax:
scp username@host:/some/path/on/remote/machine /some/path/on/my/machine
However, I would advise using rsync instead of scp: it does the same thing only
better.
rsync TOP VIEW_AS_PAGE
rsync is usually for remotely copying files to or from a computer to which you h
ave ssh access. The basic syntax is:
rsync source destination
For example, copy files from a remote machine:
$ rsync username@host:/some/path/on/remote/machine /some/path/on/my/machine/
Copy files to a remote machine:
$ rsync /some/path/on/my/machine username@host:/some/path/on/remote/machine/
You can also use it to sync two directories on your local machine:
$ rsync directory1 directory2
The great thing about rsync is that, if it fails or you stop it in the middle, y
ou can re-start it and it will pick up where it left off. It does not blindly co
py directories (or files) but rather syncronizes themwhich is to say, makes them
the same. If, for example, you try to copy a directory that you already have, rs
ync will detect this and transfer no files.
I like to use the flags:
$ rsync -azv --progress [email protected]:/my/source /my/destinatio
n/
The -a flag is for archive, which "ensures that symbolic links, devices, attribu
tes, permissions, ownerships, etc. are preserved in the transfer"; the -z flag c
ompresses files during the transfer; -v is for verbose; and --progress shows you
your progress. I ve enshrined this in an alias:
alias yy="rsync -azv --progress"
Preview rsync s behavior without actually transferring any files:
$ rsync --dry-run [email protected]:/my/source /my/destination/
Do not copy certain directories from the source:
$ rsync --exclude mydir [email protected]:/my/source /my/destinatio
n/
In this example, the directory /my/source/mydir will not be copied, and you can
omit more directories by repeating the --exclude flag.
Copy the files pointed to by the symbolic links ("transform symlink into referen
t file/dir") with the --L flag:
$ rsync -azv -L --progress [email protected]:/my/source /my/destina
tion/
source, export TOP VIEW_AS_PAGE
From An Introduction to Unix - Source and Export: Question: if we create some va
riables in a script and exit, what happens to those variables? Do they disappear
? The answer is, yes, they do. Let s make a script called test_src.sh such that:
$ cat ./test_src.sh
#!/bin/bash
myvariable=54
echo $myvariable
If we run it and then check what happened to the variable on our command line, w
e get:
$ ./test_src.sh
54
$ echo $myvariable
The variable is undefined. The command source is for solving this problem. If we
want the variable to persist, we run:
$ source ./test_src.sh
54
$ echo $myvariable
54
andvoil!our variable exists in the shell. An equivalent syntax for sourcing uses a
dot:
$ . ./test_src.sh # this is the same as "source ./test_src.sh"
54
But now observe the following. We ll make a new script, test_src_2.sh, such that
:
$ cat ./test_src_2.sh
#!/bin/bash
echo $myvariable
This script is also looking for $myvariable. Running it, we get:
$ ./test_src_2.sh
e.
Another good practice is putting links in your home directory to folders you oft
en use. This way, navigating to those folders is easy when you log in. If you ma
ke the link:
~/MYLINK --> /some/long/and/complicated/path/to/an/often/used/directory
then you need only type:
$ cd MYLINK
rather than:
$ cd /some/long/and/complicated/path/to/an/often/used/directory
Links are everywhere, so be glad you ve made their acquaintance!
readlink TOP VIEW_AS_PAGE
readlink provides a way to get the absolute path of a directory.
Get the absolute path of the directory mydir:
$ readlink -m mydir
Get the absolute path of the cwd:
$ readlink -m .
Note this is different than:
$ pwd
because I m using the term absolute path to express not only that the path is no
t a relative path, but also to denote that it s free of symbolic links. So if we
have the link discussed above:
~/MYLINK --> /some/long/and/complicated/path/to/an/often/used/directory
then:
$ cd ~/MYLINK
$ readlink -m .
/some/long/and/complicated/path/to/an/often/used/directory
One profound annoyance: readlink doesn t work the same way on Mac unix (Darwin).
To get the proper one, you need to download the GNU coreutils.
sleep TOP VIEW_AS_PAGE
The command sleep pauses for an amount of time specified in seconds. For example
, sleep for 5 seconds:
$ sleep 5
ps, pstree, jobs, bg, fg, kill, top TOP VIEW_AS_PAGE
Borrowing from An Introduction to Unix - Processes and Running Processes in the
Background: Processes are a deep subject intertwined with topics like memory usa
ge, threads, scheduling, and parallelization. My understanding of this stuffwhich
encroaches on the domain of operating systemsis narrow, but here are the basics
from a unix-eye-view. The command ps displays information about your processes:
$ ps
# show process info
$ ps -f # show verbose process info
What if you have a program that takes some time to run? You run the program on t
he command line but, while it s running, your hands are tied. You can t do any m
ore computing until it s finished. Or can you? You canby running your program in
the background. Let s look at the command sleep which pauses for an amount of ti
me specified in seconds. To run things in the background, use an ampersand:
$ sleep 60 &
[1] 6846
$ sleep 30 &
[2] 6847
The numbers printed are the PIDs or process IDs, which are unique identifiers fo
r every process running on our computer. Look at our processes:
$ ps -f
UID PID PPID C STIME TTY
TIME CMD
501 5166 5165 0 Wed12PM ttys008
0:00.20 -bash
501 6846 5166 0 10:57PM ttys008
0:00.01 sleep 60
foreground with Cntrl-c. More generally, if we didn t submit the command ourselv
es, we can kill any process on our computer using the PID. Suppose we want to ki
ll the terminal in which we are working. Let s grep for it:
$ ps -Af | grep Terminal
501 252 132 0 11:02PM ??
0:01.66 /Applications/Utilities/Termina
l.app/Contents/MacOS/Terminal
501 1653 1532 0 12:09AM ttys000
0:00.00 grep Terminal
We see that this particular process happens to have a PID of 252. grep returned
any process with the word Terminal, including the grep itself. We can be more pr
ecise with awk and see the header, too:
$ ps -Af | awk NR==1 || $2==252
UID PID PPID C STIME TTY
TIME CMD
501 252 132 0 11:02PM ??
0:01.89 /Applications/Utilities/Termina
l.app/Contents/MacOS/Terminal
(that s print the first row OR anything with the 2nd field equal to 252.) Now le
t s kill it:
$ kill 252
poof!our terminal is gone.
Running stuff in the background is useful, especially if you have a time-consumi
ng program. If you re scripting a lot, you ll often find yourself running someth
ing like this:
$ script.py > out.o 2> out.e &
i.e., running something in the background and saving both the output and error.
Two other commands that come to mind are time, which times how long your script
takes to run, and nohup ("no hang up"), which allows your script to run even if
you quit the terminal:
$ time script.py > out.o 2> out.e &
$ nohup script.py > out.o 2> out.e &
As we mentioned above, you can also set multiple jobs to run in the background,
in parallel, from a loop:
$ for i in 1 2 3; do { echo "***"$i; sleep 60 & } done
(Of course, you d want to run something more useful than sleep!) If you re lucky
enough to work on a large computer cluster shared by many userssome of whom may
be running memory- and time-intensive programsthen scheduling different users jo
bs is a daily fact of life. At work we use a queuing system called the Sun Grid
Engine to solve this problem. I wrote a short SGE wiki here.
To get a dynamic view of your processes, loads of other information, and sort th
em in different ways, use top:
$ top
Here s a screenshot of htopa souped-up version of toprunning on Ubuntu:
image
htop can show us a traditional top output split-screened with a process tree. We
see various usersroot, ubuntu, and www-dataand we see various processes, includin
g init which has a PID of 1. htop also shows us the percent usage of each of our
coreshere there s only one and we re using just 0.7%. Can you guess what this co
mputer might be doing? I m using it to host a website, as nginx, a popular web s
erver, gives away.
nohup TOP VIEW_AS_PAGE
As we mentioned above, nohup ("no hang up") allows your script to run even if yo
u quit the terminal, as in:
$ nohup script.py > out.o 2> out.e &
If you start such a process, you can squelch it by using kill on its process ID
or, less practically, by shutting down your computer. You can track down your pr
ocess with:
2
3
4
5
If you add a number in the middle of your seq range, this will be the "step":
$ seq 1 2 10
1
3
5
7
9
cut TOP VIEW_AS_PAGE
cut cuts one or more columns from a file, and delimits on tab by default. Suppos
e a file, sample.blast.txt, is:
TCONS_00007936|m.162
gi|27151736|ref|NP_006727.2|
100.00 324
TCONS_00007944|m.1236 gi|55749932|ref|NP_001918.3|
99.36 470
TCONS_00007947|m.1326 gi|157785645|ref|NP_005867.3| 91.12 833
TCONS_00007948|m.1358 gi|157785645|ref|NP_005867.3| 91.12 833
Then:
$ cat sample.blast.txt | cut -f2
gi|27151736|ref|NP_006727.2|
gi|55749932|ref|NP_001918.3|
gi|157785645|ref|NP_005867.3|
gi|157785645|ref|NP_005867.3|
You can specify the delimiter with the -d flag, so:
$ cat sample.blast.txt | cut -f2 | cut -f4 -d"|"
NP_006727.2
NP_001918.3
NP_005867.3
NP_005867.3
although this is long-winded and in this case we can achieve the same result sim
ply with:
$ cat sample.blast.txt | cut -f5 -d"|"
NP_006727.2
NP_001918.3
NP_005867.3
NP_005867.3
Don t confuse cut with its non-unix namesake on Macintosh, which deletes text wh
ile copying it to the clipboard.
Tip: If you re a Vim user, running cut as a system command within Vim is a neat
way to filter text. Read more: Wiki Vim - System Commands in Vim.
paste TOP VIEW_AS_PAGE
paste joins files together in a column-wise fashion. Another way to think about
this is in contrast to cat, which joins files vertically. For example:
$ cat file1.txt
a
b
c
$ cat file2.txt
1
2
3
$ paste file1.txt file2.txt
a
1
b
2
c
3
Paste with a delimiter:
c
c
t
c
There are two exceptions to the execute code per line rule: anything in a BEGIN
block gets executed before the file is read and anything in an END block gets ex
ecuted after it s read. If you define variables in awk they re global and persis
t rather than being cleared every line. For example, we can concatenate the elem
ents of the first column with an @ delimiter using the variable x:
$ cat test.txt | awk BEGIN{x=""}{x=x"@"$1; print x}
@1
@1@3
@1@3@2
@1@3@2@1
$ cat test.txt | awk BEGIN{x=""}{x=x"@"$1}END{print x}
@1@3@2@1
Or we can sum up all values in the first column:
$ cat test.txt | awk {x+=$1}END{print x} # x+=$1 is the same as x=x+$1
7
Awk has a bunch of built-in variables which are handy: NR is the row number; NF
is the total number of fields; and OFS is the output delimiter. There are many m
ore you can read about here. Continuing with our very contrived examples, let s
see how these can help us:
$ cat test.txt | awk {print $1"\t"$2}
# write tab explicitly
1
c
3
c
2
t
1
c
$ cat test.txt | awk {OFS="\t"; print $1,$2} # set output field separator to t
ab
1
c
3
c
2
t
1
c
Setting OFS spares us having to type a "\t" every time we want to print a tab. W
e can just use a comma instead. Look at the following three examples:
$ cat test.txt | awk {OFS="\t"; print $1,$2}
# print file as is
1
c
3
c
2
t
1
c
$ cat test.txt | awk {OFS="\t"; print NR,$1,$2}
# print row num
1
1
c
2
3
c
3
2
t
4
1
c
$ cat test.txt | awk {OFS="\t"; print NR,NF,$1,$2} # print row & field num
1
2
1
c
2
2
3
c
3
2
2
t
4
2
1
c
So the first command prints the file as it is. The second command prints the fil
e with the row number added in front. And the third prints the file with the row
number in the first column and the number of fields in the secondin our case alw
ays two. Although these are purely pedagogical examples, these variables can do
a lot for you. For example, if you wanted to print the 3rd row of your file, you
could use:
$ cat test.txt | awk {if (NR==3) {print $0}} # print the 3rd row of your file
2
t
$ cat test.txt | awk {if (NR==3) {print}}
# same thing, more compact syntax
2
t
$ cat test.txt | awk NR==3
# same thing, most compact syntax
2
t
Sometimes you have a file and you want to check if every row has the same number
of columns. Then use:
$ cat test.txt | awk {print NF} | sort -u
2
In awk $NF refers to the contents of the last field:
$ cat test.txt | awk {print $NF}
c
c
t
c
An important point is that by default awk delimits on white-space, not tabs (unl
ike, say, cut). White space means any combination of spaces and tabs. You can te
ll awk to delimit on anything you like by using the -F flag. For instance, let s
look at the following situation:
$ echo "a b" | awk {print $1}
a
$ echo "a b" | awk -F"\t" {print $1}
a b
When we feed a space b into awk, $1 refers to the first field, a. However, if we
explicitly tell awk to delimit on tabs, then $1 refers to a b because it occurs
before a tab.
You can also use shell variables inside your awk by importing them with the -v f
lag:
$ x=hello
$ cat test.txt | awk -v var=$x { print var"\t"$0 }
hello 1
c
hello 3
c
hello 2
t
hello 1
c
And you can write to multiple files from inside awk:
$ cat test.txt | awk {if ($1==1) {print > "file1.txt"} else {print > "file2.txt
"}}
$ cat file1.txt
1
c
1
c
$ cat file2.txt
3
c
2
t
For loops in awk:
$ echo joe | awk {for (i = 1; i <= 5; i++) {print i}}
1
2
3
4
5
Question: In the following case, how would you print the row numbers such that t
he first field equals the second field?
$ echo -e "a\ta\na\tc\na\tz\na\ta"
a
a
a
c
a
z
a
a
Here s the answer:
$ echo -e "a\ta\na\tc\na\tz\na\ta" | awk $1==$2{print NR}
1
4
Question: How would you print the average of the first column in a text file?
$ cat file.txt | awk BEGIN{x=0}{x=x+$1;}END{print x/NR}
NR is a special variable representing row number.
The take-home lesson is, you can do tons with awk, but you don t want to do too
much. Anything that you can do crisply on one, or a few, lines is awk-able. For
more involved scripting examples, see An Introduction to Unix - More awk example
s.
sed TOP VIEW_AS_PAGE
From An Introduction to Unix - sed: Sed, like awk, is a full-fledged language th
at is convenient to use in a very limited sphere (GNU Sed Guide). I mainly use i
t for two things: (1) replacing text, and (2) deleting lines. Sed is often menti
oned in the same breath as regular expressions although, like the rest of the wo
rld, I d use Perl and Python when it comes to that. Nevertheless, let s see what
sed can do.
Sometimes the first line of a text file is a header and you want to remove it. T
hen:
$ cat test_header.txt
This is a header
1
asdf
2
asdf
2
asdf
$ cat test_header.txt | sed 1d
# delete the first line
1
asdf
2
asdf
2
asdf
To remove the first 3 lines:
$ cat test_header.txt | sed 1,3d # delete lines 1-3
2
asdf
1,3 is sed s notation for the range 1 to 3. We can t do much more without enteri
ng regular expression territory. One sed construction is:
/pattern/d
where d stands for delete if the pattern is matched. So to remove lines beginnin
g with #:
$ cat test_comment.txt
1
asdf
# This is a comment
2
asdf
# This is a comment
2
asdf
$ cat test_comment.txt | sed /^#/d
1
asdf
2
asdf
2
asdf
Another construction is:
s/A/B/
where s stands for substitute. So this means replace A with B. By default, this
only works for the first occurrence of A, but if you put a g at the end, for gro
up, all As are replaced:
s/A/B/g
For example:
$ # replace 1st occurrence of kitty with X
$ echo "hello kitty. goodbye kitty" | sed s/kitty/X/
hello X. goodbye kitty
$ # same thing. using | as a separator is ok
$ echo "hello kitty. goodbye kitty" | sed s|kitty|X|
hello X. goodbye kitty
$ # replace all occurrences of kitty
Unzip:
$ gunzip file.gz
Bunzip:
$ bunzip2 file.bz2
You can pipe into these commands!
$ cat file.gz | gunzip | head
$ cat file | awk {print $1"\t"100} | gzip > file2.gz
The first preserves the zipped file while allowing you to look at it, while the
second illustrates how one may save space by creating a zipped file right off th
e bat (in this case, with some random awk).
cat a file without a unzipping it:
$ zcat file.gz
less a file without a unzipping it:
$ zless file.gz
I should emphasize again that if you re dealing with large data files, you shoul
d always, always, always compress them. Failing to do so is downright irresponsi
ble!
tar TOP VIEW_AS_PAGE
tar rolls, or glues, an entire directory structure into a single file.
Tar a directory named dir into a tarball called dir.tar:
$ tar -cvf dir.tar dir
(The original dir remains) The options I m using are -c for "create a new archiv
e containing the specified items"; -f for "write the archive to the specified fi
le"; and -v for verbose.
To untar, use the -x flag, which stands for extract:
$ tar -xvf dir.tar
Tar and zip a directory dir into a zipped tarball dir.tar.gz:
$ tar -zcvf dir.tar.gz dir
Extract plus unzip:
$ tar -zxvf dir.tar.gz
uniq TOP VIEW_AS_PAGE
uniq filters a file leaving only the unique lines, provided the file is sorted.
Suppose:
$ cat test.txt
aaaa
bbbb
aaaa
aaaa
cccc
cccc
Then:
$ cat test.txt | uniq
aaaa
bbbb
aaaa
cccc
This can be thought of as a local uniquingadjacent rows are not the same, but you
can still have a repeated row in the file. If you want the global unique, sort
first:
$ cat test.txt | sort | uniq
aaaa
bbbb
cccc
This is identical to:
$ cat test.txt | sort -u
aaaa
bbbb
cccc
uniq also has the ability to show you only the lines that are not unique with th
e duplicate flag:
$ cat test.txt | sort | uniq -d
aaaa
cccc
dirname, basename TOP VIEW_AS_PAGE
dirname and basename grab parts of a file path:
$ basename /some/path/to/file.txt
file.txt
$ dirname /some/path/to/file.txt
/some/path/to
The first gets the file name, the second the directory in which the file resides
. To say the same thing a different way, dirname gets the directory minus the fi
le, while basename gets the file minus the directory.
You can play with these:
$ ls $( dirname $( which my_program ) )
This would list the files wherever my_program lives.
In a bash script, it s sometimes useful to grab the directory where the script i
tself resides and store this path in a variable:
# get the directory in which your script itself resides
d=$( dirname $( readlink -m $0 ) )
set, unset TOP VIEW_AS_PAGE
Use set to set various properties of your shell, somewhat analogous to a Prefere
nces section in a GUI.
E.g., use vi style key bindings in the terminal:
$ set -o vi
Use emacs style key bindings in the terminal:
$ set -o emacs
You can use set with the -x flag to debug:
set -x # activate debugging from here
.
.
.
set +x # de-activate debugging from here
This causes all commands to be echoed to std:err before they are run. For exampl
e, consider the following script:
#!/bin/bash
set -eux
# the other flags are:
# -e Exit immediately if a simple command exits with a non-zero status
# -u Treat unset variables as an error when performing parameter expansion
echo hello
sleep 5
echo joe
This will echo every command before running it. The output is:
+ echo hello
hello
+ sleep 5
+ echo joe
joe
Changing gears, you can use unset to clear variables, as in:
$ TEST=asdf
$ echo $TEST
asdf
$ unset TEST
$ echo $TEST
env TOP VIEW_AS_PAGE
There are 3 notable things about env. First, if you run it as standalone, it wil
l print out all the variables and functions set in your environment:
$ env
Second, as discussed in An Introduction to Unix - The Shebang, you can use env t
o avoid hard-wired paths in your shebang. Compare this:
#!/usr/bin/env python
to this:
#!/some/path/python
The script with the former shebang will conveniently be interpreted by whichever
python is first in your PATH; the latter will be interpreted by /some/path/pyth
on.
And, third, as Wikipedia says, env can run a utility "in an altered environment
without having to modify the currently existing environment." I never have occas
ion to use it this way but, since it was in the news recently, look at this exam
ple (stolen from here):
$ env COLOR=RED bash -c echo "My favorite color is $COLOR"
My favorite color is RED
$ echo $COLOR
The COLOR variable is only defined temporarily for the purposes of the env state
ment (when we echo it afterwards, it s empty). This construction sets us up to u
nderstand the bash shellshock bug, which Stack Exchange illustrates using env:
$ env x= () { :;}; echo vulnerable bash -c "echo this is a test"
I unpack this statement in a blog post.
uname TOP VIEW_AS_PAGE
As its man page says, uname prints out various system information. In the simple
st form:
$ uname
Linux
If you use the -a flag for all, you get all sorts of information:
$ uname -a
Linux my.system 2.6.32-431.11.2.el6.x86_64 #1 SMP Tue Mar 25 19:59:55 \
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
df, du TOP VIEW_AS_PAGE
df reports "file system disk space usage". This is the command you use to see ho
w many much space you have left on your hard disk. The -h flag means "human read
able," as usual. To see the space consumption on my mac, for instance:
$ df -h .
Filesystem
Size Used Avail Use% Mounted on
/dev/disk0s3
596G 440G 157G 74% /
df shows you all of your mounted file systems. For example, if you re familiar w
ith Amazon Web Services (AWS) jargon, df is the command you use to examine your
mounted EBS volumes. They refer to it in the "Add a Volume to Your Instance" tut
orial.
du is similar to df but it s for checking the sizes of individual directories. E
.g.:
$ du -sh myfolder
284M
myfolder
If you wanted to check how much space each folder is using in your HOME director
y, you could do:
$ cd
$ du -sh *
This will probably take a while. Also note that there s some rounding in the cal
culation of how much space folders and files occupy, so the numbers df and du re
turn may not be exact.
Find all the files in the directory /my/dir in the gigabyte range:
$ du -sh /my/dir/* | awk $1 ~ /G/
bind TOP VIEW_AS_PAGE
As discussed in detail in An Introduction to Unix - Working Faster with Readline
Functions and Key Bindings, ReadLine Functions (GNU Documentation) allow you to
take all sorts of shortcuts in the terminal. You can see all the Readline Funct
ions by entering:
$ bind -P # show all Readline Functions and their key bindings
$ bind -l # show all Readline Functions
Four of the most excellent Readline Functions are:
forward-word - jump cursor forward a word
backward-word - jump cursor backward a word
history-search-backward - scroll through your bash history backward
history-search-forward - scroll through your bash history forward
For the first two, you can use the default Emacs way:
Meta-f - jump forward one word
Meta-b - jump backward one word
However, reaching for the Esc key is a royal pain in the assyou have to re-positi
on your hands on the keyboard. This is where key-binding comes into play. Using
the command bind, you can map a Readline Function to any key combination you lik
e. Of course, you should be careful not to overwrite pre-existing key bindings t
hat you want to use. I like to map the following keys to these Readline Function
s:
Cntrl-forward-arrow - forward-word
Cntrl-backward-arrow - backward-word
up-arrow - history-search-backward
down-arrow - history-search-forward
In my .bash_profile (or, more accurately, in my global bash settings file) I use
:
# make cursor jump over words
bind "\e[5C": forward-word
# control+arrow_right
bind "\e[5D": backward-word
# control+arrow_left
# make history searchable by entering the beginning of command
# and using up and down keys
bind "\e[A": history-search-backward # arrow_up
bind "\e[B": history-search-forward
# arrow_down
(Although these may not work universally.) How does this cryptic symbology trans
late into these particular keybindings? Again, refer to An Introduction to Unix.
alias, unalias TOP VIEW_AS_PAGE
As mentioned in An Introduction to
of mapping one word to another. It
rter expression stand for a longer
alias c="cat"
This means every time you type a c
nstead of writing:
$ cat file.txt
you just write:
$ c file.txt
Another use of alias is to weld particular flags onto a command, so every time t
he command is called, the flags go with it automatically, as in:
alias cp="cp -R"
or
alias mkdir="mkdir -p"
Recall the former allows you to copy directories as well as files, and the later
allows you to make nested directories. Perhaps you always want to use these opt
ions, but this is a tradeoff between convenience and freedom. In general, I pref
er to use new words for aliases and not overwrite preexisting bash commands. Her
e are some aliases I use in my setup file (see the complete list here):
# coreutils
alias c="cat"
alias e="clear"
alias s="less -S"
alias l="ls -hl"
alias lt="ls -hlt"
alias ll="ls -al"
alias rr="rm -r"
alias r="readlink -m"
alias ct="column -t"
alias ch="chmod -R 755"
alias chh="chmod -R 644"
alias grep="grep --color"
alias yy="rsync -azv --progress"
# remove empty lines with white space
alias noempty="perl -ne {print if not m/^(\s*)$/} "
# awk
alias awkf="awk -F \t "
alias length="awk {print length} "
# notesappend, my favorite alias
alias n="history | tail -2 | head -1 | tr -s
| cut -d
-f3- | awk {print
\"# \"\$0} >> notes"
# HTML-related
alias htmlsed="sed s|\&|\&\;|g; s|>|\>\;|g; s|<|\<\;|g; "
# git
alias ga="git add"
alias gc="git commit -m"
alias gac="git commit -a -m"
alias gp="git pull"
alias gpush="git push"
alias gs="git status"
alias gl="git log"
alias gg="git log --oneline --decorate --graph --all"
# alias gg="git log --pretty=format: %h %s %an --graph"
alias gb="git branch"
alias gch="git checkout"
alias gls="git ls-files"
alias gd="git diff"
To display all of your current aliases:
$ alias
To get rid of an alias, use unalias, as in:
$ unalias myalias
column TOP VIEW_AS_PAGE
Suppose you have a file with fields of variable length. Viewing it in the termin
al can be messy because, if a field in one row is longer than one in another, it
will upset the spacing of the columns. You can remedy this as follows:
cat myfile.txt | column -t
This puts your file into a nice table, which is what the -t flag stands for. Her
e s an illustration:
image
The file tmp.txt is tab-delimited but, because the length of the fields is so di
fferent, it looks ugly.
With column -t:
image
If your file has many columns, the column command works particularly well in com
bination with:
less -S
which allows horizontal scrolling and prevents lines from wrapping onto the next
row:
cat myfile.txt | column -t | less -S
column delimits on whitespace rather than tab by default, so if your fields them
selves have spaces in them, amend it to:
cat myfile.txt | column -s
-t | less -S
(where you make a tab in the terminal by typing Cntrl-v tab). This makes the ter
minal feel almost like an Excel spreadsheet. Observe how it changes the viewing
experience for this file of fake financial data:
image
find TOP VIEW_AS_PAGE
find is for finding files or folders in your directory tree. You can also use it
to list the paths of all subdirectories and files of a directory. To find files
, you can use the simple syntax:
find /some/directory -name "myfile"
For instance, to find all files with the extension .html in the current director
y or any of its sub-directories:
$ find ./ -name "*.html"
The flag -iname makes the command case insensitive:
$ find /my/dir -iname "*alex*"
This would find any file or directory containing ALEX, Alex, alex, and so on in
/my/dir and its children.
To see the path of every child file and directory from the cwd on down, simply t
ype:
$ find
find is one of the best tools to print out file paths, irrespective of whether y
ou re looking for a file. The command also works particularly well in combinatio
n with xargs - see the example below.
touch TOP VIEW_AS_PAGE
touch makes an empty file. E.g., to make an empty file called test:
$ touch test
Sometimes you find yourself running this command to see if you have write permis
sion in a particular directory.
diff, comm TOP VIEW_AS_PAGE
diff prints out the differences between two files. It s the best way to find dis
crepancies between files you think are the same. If:
$ cat tmp1
a
a
b
c
$ cat tmp2
a
x
b
c
then diff catches that line 2 is different:
$ diff tmp1 tmp2
2c2
< a
--> x
comm, for common, is similar to diff but I rarely have occasion to use it.
Note: If you re familiar with git, you know that the diff operation plays a cent
ral role in its version control system. Git has its own flavor of diff: git diff
. For example, to see the difference between commit c295z17 and commit e5d6565:
$ git diff c295z17 e5d6565
join TOP VIEW_AS_PAGE
join joins two sorted files on a common key (the first column by default). If:
$ cat tmp1.txt
1
a
2
b
3
c
$ cat tmp2.txt
2
aa
3
bb
Then:
$ join tmp1.txt tmp2.txt
2 b aa
3 c bb
My lab uses a short Perl script called tableconcatlines (written by my co-worker
, Vladimir) to do a join without requiring the files to be pre-sorted:
#!/usr/bin/env perl
# About:
# join two text files using the first column of the first file as the "key"
# Useage:
# tableconcatlines example/fileA.txt example/fileB.txt
%h = map {/(\S*)\s(.*)/; $1 => $2} split(/\n/, cat $ARGV[1]);
open $ifile, < , $ARGV[0];
while (<$ifile>)
{
/^(\S*)/;
chop;
print $_ . "\t" . $h{$1} . "\n";
}
md5, md5sum TOP VIEW_AS_PAGE
Imagine the following scenario. You ve just downloaded a large file from the int
ernet. How do you know no data was lost during the transfer and you ve made an e
xact copy of the one that was online?
To solve this problem, let s review of the concept of hashing. If you re familia
r with a dict in Python or a hash in Perl, you know that a hash, as a data struc
ture, is simply a way to map a set of unique keys to a set of values. In ordinar
y life, an English dictionary is a good representation of this data structure. I
f you know your key is "cat" you can find your value is "a small domesticated ca
rnivorous mammal with soft fur, a short snout, and retractile claws", as Google
defines it. In the English dictionary, the authors assigned values to keys, but
suppose we only have keys and we want to assign values to them. A hash function
describes a method for how to boil down keys into values. Without getting deep i
nto the theory of hashing, it s remarkable that you can hash, say, text files of
arbitrary length into a determined range of numbers. For example, a very stupid
hash would be to assign every letter to a number:
A -> 1
B -> 2
C -> 3
.
.
.
and then to go through the file and sum up all the numbers; and finally to take,
say, modulo 1000. With this, we could assign the novels Moby Dick, Great Expect
ations, and Middlemarch all to numbers between 1 and 1000! This isn t a good has
h function because two novels might well get the same number but nevermindenough
of a digression already.
md5 is a hash function that hashes a whole file into a long string. The commands
md5 and md5sum do about the same thing. For example, to compute the md5 hash of
a file tmp.txt:
$ md5 tmp.txt
84fac4682b93268061e4adb49cee9788 tmp.txt
$ md5sum tmp.txt
84fac4682b93268061e4adb49cee9788 tmp.txt
This is a great way to check that you ve made a faithful copy of a file. If you
re downloading an important file, ask the file s owner to provide the md5 sum. A
fter you ve downloaded the file, compute the md5 on your end and check that it s
the same as the provided one. In fact, if you re being careful about the fideli
ty of transferred files, it s irresponsible not to use the md5!
md5 is one of many hashing functions. Another one, for example, is sha1, which w
ill be familiar to users of git.
tr TOP VIEW_AS_PAGE
tr stands for translate and it s a utility for replacing characters in text. For
example, to replace a period with a newline:
$ echo "joe.joe" | tr "." "\n"
joe
joe
Change lowercase letters to uppercase letters:
$ echo joe | tr "[:lower:]" "[:upper:]"
JOE
Find the numberings of columns in a header (produced by the bioinformatics progr
am BLAST):
$ cat blast_header
qid
sid
pid
alignmentlength mismatches
numbergap
query_st
art
query_end
subject_start subject_end
evalue bitscore
$ cat blast_header | tr "\t" "\n" | nl -b a
1 qid
2 sid
3 pid
4 alignmentlength
5 mismatches
6 numbergap
7 query_start
8 query_end
9 subject_start
10 subject_end
11 evalue
12 bitscore
tr, with the -d flag, is also useful for deleting characters. If we have a file
tmp.txt:
$ cat tmp.txt
a a a a
a b b b
a v b b
1 b 2 3
then:
$ cat tmp.txt | tr -d "b"
a a a a
a
a v
1 2 3
This is one of the easiest ways to delete newlines from a file:
$ cat tmp.txt | tr -d "\n"
a a a aa b b ba v b b1 b 2 3
Tip: To destroy carriage return characters ("\r"), often seen when you open a Wi
ndows file in linux, use:
$ cat file.txt | tr -d "\r"
od TOP VIEW_AS_PAGE
One of the most ninja moves in unix is the od command which, with the -tc flag,
explicitly prints every character in a string or a file. For example:
$ echo joe | od -tc
0000000 j o e \n
0000004
We see everything: the j, the o, the e, and the newline. This is incredibly usef
ul for debugging, especially because some programsnotably those that bear the Mic
rosoft stampwill silently insert evil "landmine" characters into your files. The
most common issue is that Windows/Microsoft sometimes uses a carriage-return (\r
), whereas Mac/unix uses a much more sensible newline (\n). If you re transferri
ng files from a Windows machine to unix or Mac and let your guard down, this can
cause unexpected bugs. Consider a Microsoft Excel file:
image
If we save this spreadsheet as a text file and try to cat it, it screws up? Why?
Our od command reveals the answer:
$ cat Workbook1.txt | od -tc
0000000 1 \t 2 \r 1 \t 2 \r 1 \t 2
0000013
Horrible carriage-returns! Let s fix it and check that it worked:
$ cat ~/Desktop/Workbook1.txt | tr "\r" "\n" | od -tc
0000000 1 \t 2 \n 1 \t 2 \n 1 \t 2
0000013
Score!
split TOP VIEW_AS_PAGE
split splits up a file. For example, suppose that:
$ cat test.txt
1
2
3
4
5
6
7
8
9
10
If we want to split this file into sub-files with 3 lines each, we can use:
$ split -l 3 -d test.txt test_split_
where -l 3 specifies that we want 3 lines per file; -d specifies we want numeric
suffixes; and test_split_ is the prefix of each of our sub-files. Let s examine
the result of this command:
$ head test_split_*
==> test_split_00 <==
1
2
3
==> test_split_01 <==
4
5
6
==> test_split_02 <==
7
8
9
==> test_split_03 <==
10
Note that the last file doesn t have 3 lines because 10 is not divisible by 3its
line count equals the remainder.
Splitting files is an important technique for parallelizing computations.
nano, emacs, vim TOP VIEW_AS_PAGE
nano is a basic a text editor that should come pre-packaged with your linux dist
ribution. There are two fundamental commands you need to know to use nano:
Cntrl-O - Save
Cntrl-X - Quit
nano is not a great text editor and you shouldn t use it for anything fancy. But
for simple taskssay, pasting swaths of text into a fileit s a good choice.
For more serious text editing, use Vim or Emacs. These programs have too many fe
atures to discuss in this post but, as a Vim partisan, I have an introduction to
Vim here.
tree TOP VIEW_AS_PAGE
Note: tree is not a default shell program. You may have to download and install
it.
tree prints out a tree, in ASCII characters, of your directory structure. For ex
ample:
$ mkdir -p tmp/{a,b}/c/{d,e}
Let s run tree on this directory and illustrate its output:
image
If you make the mistake of running this command in your home directory, you ll g
et a monstrously large output. For that reason, you can restrict the depth with
the -L flag:
$ tree -L 2 tmp
This prints the tree only two directory levels down.
c
k
n
p
<Leader>
<Leader>
<Leader>
<Leader>
<Leader>
<Leader>
<Leader>
<Leader>
<Leader>
<Leader>
Toggle between the current and previous window
d
Detach the screen session
A
Set window s title
w
List all windows
0-9
Go to a window numbered 0-9
[
Start copy mode
Esc
Start copy mode
]
Paste copied text
?
Help (display a list of commands)
A note: to scroll inside screen (you ll find you can t use the window scrollbar
anymore), enter into copy mode and then navigate up and down with Cntrl-u and Cn
trl-d or j and k, as in Vim.
You can configure screen with the file .screenrc, placed in your home directory.
Here s mine, which my co-worker, Albert, "lent" me (get the most up-to-date ver
sion of my dotfiles on GitHub):
# change Leader key to Cntrl-f rather than Cntrl-a
escape ^Ff
defscrollback 5000
shelltitle $ |bash
autodetach on
# disable the startup splash message that screen displays on startup
startup_message off
# create a status line at the bottom of the screen. this will show the titles a
nd locations of
# all screen windows you have open at any given time
hardstatus alwayslastline %{= kG}[ %{G}%H %{g}][%= %{=kw}%?%-Lw%?%{r}(%{W}%n*%f
%t%?(%u)%?%{r})%{w}%?%+Lw%?%?%= %{g}][%{B}%Y-%m-%d %{W}%c %{g}]
#use F8 to turn the status bar off at the bottom of the screen
bindkey -k k5 hardstatus alwayslastline
# use F9 to turn the status bar on the bottom back on
bindkey -k k6 hardstatus alwaysignore
# the next 2 lines are set up to use F1 and F2 to move
# one screen forward or backward (respectively) through your screen session.
bindkey -k k1 prev
bindkey -k k2 next
This demonstrates how to use Cntrl-f as the Leader key, rather than Cntrl-a.
tmux TOP VIEW_AS_PAGE
Note: tmux is not a default shell program. You may have to download and install
it.
As discussed in An Introduction to Unix - tmux, tmux is a more modern version of
screen, which all the cool kids are using. In tmux there are:
sessions
windows
panes
Sessions are groupings of tmux windows. For most purposes, you only need one ses
sion. Within each session, we can have multiple "windows" which you see as tabs
on the bottom of your screen (i.e., they re virtual windows, not the kind you d
create on a Mac by typing N). You can further subdivide windows into panes, altho
ugh this can get hairy fast. Here s an example of a tmux session containing four
windows (the tabs on the bottom), where the active window contains 3 panes:
image
I m showing off by logging into three different computershome, work, and Amazon.
Start a new session:
$ tmux
Detach from your session: Leader d, where the leader sequence is Cntrl-b by defa
ult.
List your sessions:
$ tmux ls
0: 4 windows (created Fri Sep 12 16:52:30 2014) [177x37]
Attach to a session:
$ tmux attach
# attach to the first session
$ tmux attach -t 0
# attach to a session by id
Kill a session:
$ tmux kill-session -t 0
# kill a session by id
Here s a more systematic list of commands:
<Leader> c
Create a new window
<Leader> x
Kill the current window
<Leader> n
Go to the next window
<Leader> p
Go to the previous window
<Leader> l
Go to the last-seen window
<Leader> 0-9
Go to a window numbered 0-9
<Leader> w
List windows
<Leader> s
Toggle between sessions
<Leader> d
Detach the session
<Leader> ,
Set a window s title
<Leader> [
Start copy mode (which will obey vim conventions per order of th
e config file)
<Leader> ]
Paste what you grabbed in copy mode
<Leader> %
Spawn new horizontal pane (within the window)
<Leader> "
Spawn new vertical pane (within the window)
<Leader> o
Go to the next pane (within the window)
<Leader> o (pressed simultaneously*)
Swap position of panes (within the windo
w)
<Leader> q
Number the panes (within the window) - whereupon you can jump to
a specific pane by pressing its numerical index
<Leader> z
Toggle the expansion of one of the panes (within the window)
<Leader> t
Display the time
<Leader> m
Turn mouse mode on (allows you to resize panes with the mouse)
<Leader> M
Turn mouse mode off
<Leader> :
Enter command mode
* With tmux, in contrast to screen, you give a slightly longer pause after press
ing the Leader sequence. In this particular case, however, they are pressed simu
ltaneously.
As before, I like to use Cntrl-f as my Leader sequence. Here s a sample .tmux.co
nf file, which accomplishes this, among other useful things:
### Default Behavior
# Use screen key binding
set-option -g prefix C-f
# 1-based numbering
set -g base-index 1
# Fast command sequence - cause some error
# set -s escape-time 0
# Set scrollback to 10000 lines
set -g history-limit 10000
# Agressive window
setw -g aggressive-resize on
### Views
# Highlight Active Window
set-window-option -g window-status-current-bg red
# Status Bar
set -g status-bg black
set -g status-fg white
set -g status-left ""
set -g status-right "#[fg=green]#H"
### Shortcut
# Last Active Window
bind-key C-a last-window
# set -g mode-mouse on
# vi bindings in copy mode
set-window-option -g mode-keys vi
# Tmux window size ( http://superuser.com/questions/238702/maximizing-a-pane-intmux )
unbind +
bind + new-window -d -n tmux-zoom clear && echo TMUX ZOOM && read \; swap-pane
-s tmux-zoom.0 \; select-window -t tmux-zoom
unbind bind - last-window \; swap-pane -s tmux-zoom.0 \; kill-window -t tmux-zoom
(Get my up-to-date dotfile on GitHub.)
To reload .tmux.conf after making changes: <Leader> :source-file ~/.tmux.conf
To copy and paste text, assuming vi bindings:
first enter copy mode with <Leader> [
copy text with space to enter visual mode, then select text, then use y to yank,
then hit Enter
paste text with <Leader> ]
tmux is amazingI ve gone from newbie to cheerleader in about 30 seconds!
Hat tip: Albert
yes TOP VIEW_AS_PAGE
yes prints out the character y in an infinite loop (so be careful - you ll have
to stop it with Cntrl-c). If you give yes an argument, it will print out the arg
ument rather than y. For instance, to print the word oliver 30 times, the comman
d would be:
$ yes oliver | head -30
nl TOP VIEW_AS_PAGE
nl numbers each row in a file. E.g.:
$ cat tmp.txt
aaa
bbb
ccc
ddd
eee
$ nl -b a tmp.txt
1 aaa
2 bbb
3 ccc
4 ddd
5 eee
You can accomplish a similar thing with awk:
$ cat tmp.txt | awk {print NR"\t"$0}
1
aaa
2
bbb
3
ccc
4
ddd
5
eee
whoami TOP VIEW_AS_PAGE
To see your unix username:
$ whoami
groups TOP VIEW_AS_PAGE
To see what groups you belong to:
$ groups
who, w TOP VIEW_AS_PAGE
who and w are very similar. Both show a list of users currently logged on to the
system, but w gives slightly more information.
See users currently logged in:
$ who
See users currently logged in as well as what command they re running:
$ w
hostname TOP VIEW_AS_PAGE
hostname prints the system s host name, which is like the name of the computer.
If you ve ssh-ed into multiple computers in a bunch of different terminal sessio
ns, this command will remind you where you are:
$ hostname
finger TOP VIEW_AS_PAGE
finger is a neat command that prints out information about a user on the system.
For example, to finger yourself:
$ finger $( whoami )
Login: my_username
Name: My Name
Directory: /path/to/home Shell: /bin/bash
Never logged in.
No mail.
Plan:
Hello, fools and lovers!
What s the plan thing? When you finger a user it will output the contents of the
file
~/.plan
(Was this the genesis of Amherst College s PlanWorld? I think so! :-)
read TOP VIEW_AS_PAGE
read is a shell builtin that I like to use in loops. You can read up a file with
a while loop using reade.g., spit out file.txt exactly as is:
$ while read x; do echo "$x"; done < file.txt
Another example:
$ echo -e 1\t2\n3\t4\n5\t6
1
2
3
4
5
6
$ echo -e 1\t2\n3\t4\n5\t6 | while read x y; do echo $x; done
1
3
5
tee TOP VIEW_AS_PAGE
As discussed in An Introduction to Unix - Piping in Unix, tee is a command somet
imes seen in unix pipelines. Suppose we have a file test.txt such that:
$ cat test.txt
1
c
3
c
2
t
1
c
Then:
$ cat test.txt | sort -u | tee tmp.txt | wc -l
3
$ cat tmp.txt
1
c
2
t
3
c
tee, in rough analogy with a plumber s tee fitting, allows us to save a file in
the middle of the pipeline and keep going. In this case, the output of sort is b
oth saved as tmp.txt and passed through the pipe to wc -l, which counts the line
s of the file.
Another similar example:
$ echo joe | tee test.txt
joe
$ cat test.txt
joe
The same idea: joe is echoed to std:out as well as saved in the file test.txt.
shopt TOP VIEW_AS_PAGE
shopt, for shell options, controls various togglable shell options, as its name
suggests. You can see the options it controls by simply entering the command. Fo
r example, on my system:
$ shopt
cdable_vars
off
cdspell
off
checkhash
off
checkwinsize
off
cmdhist
on
compat31
off
dotglob
off
.
.
.
There s a discussion about shopt s histappend option in An Introduction to Unix
- history.
pushd, popd TOP VIEW_AS_PAGE
pushd and popd keep track of the directories you move through in a stack. When y
ou use:
$ pushd some_directory
It acts as a:
$ cd some_directory
except that some_directory is also added to the stack. Let s see an example of h
ow to use this:
$ pushd ~/TMP
# we re in ~/
~/TMP ~
$ pushd ~/DATA
# now we re in ~/TMP
~/DATA ~/TMP ~
$ pushd ~
# now we re in ~/DATA
~ ~/DATA ~/TMP ~
$ popd
# now we re in ~/
~/DATA ~/TMP ~
$ popd
# now we re in ~/DATA
~/TMP ~
$ popd
# now we re in ~/TMP
~
$
# now we re in ~/
This is useful, but I never use it. I like to use a function I stole off the int
ernet:
cd_func ()
{
local x2 the_new_dir adir index;
local -i cnt;
if [[ $1 == "--" ]]; then
dirs -v;
return 0;
fi;
the_new_dir=$1;
[[ -z $1 ]] && the_new_dir=$HOME;
if [[ ${the_new_dir:0:1} == - ]]; then
index=${the_new_dir:1};
[[ -z $index ]] && index=1;
adir=$(dirs +$index);
[[ -z $adir ]] && return 1;
the_new_dir=$adir;
fi;
[[ ${the_new_dir:0:1} == ~ ]] && the_new_dir="${HOME}${the_new_dir:1}";
pushd "${the_new_dir}" > /dev/null;
[[ $? -ne 0 ]] && return 1;
the_new_dir=$(pwd);
popd -n +11 2> /dev/null > /dev/null;
for ((cnt=1; cnt <= 10; cnt++))
do
x2=$(dirs +${cnt} 2>/dev/null);
[[ $? -ne 0 ]] && return 0;
[[ ${x2:0:1} == ~ ]] && x2="${HOME}${x2:1}";
if [[ "${x2}" == "${the_new_dir}" ]]; then
popd -n +$cnt 2> /dev/null > /dev/null;
cnt=cnt-1;
fi;
done;
return 0
}
In your setup dotfiles, alias cd to this function:
alias cd=cd_func
and then you can type:
$ cd -0 ~
1 ~/DATA
2 ~/TMP
to see all the directories you ve visited. To go ~/TMP, for example, enter:
$ cd -2
true, false TOP VIEW_AS_PAGE
true and false are useful to make multi-line comments in a bash script:
# multi-line comment
if false; then
echo hello
echo hello
echo hello
fi
To comment these echo statements back in:
if true; then
echo hello
echo hello
echo hello
fi
The formal man page definitions of these commands are amusing: true: "do nothing
, successfully (exit with a status code indicating success)"; false: "do nothing
, unsuccessfully (exit with a status code indicating failure)".
shift TOP VIEW_AS_PAGE
As in Perl and many other languages, shift pops elements off the array of input
arguments in a script. Suppose tmpscript is:
#!/bin/bash
echo $1
shift
echo $1
Then:
$ ./tmpscript x y
x
y
Refer to An Introduction to Unix - Arguments to a Script to see how to use shift
to make an input flag parsing section for your script.
g++ TOP VIEW_AS_PAGE
Compile a C++ program, myprogram.cpp:
$ g++ myprogram.cpp -o myprogram
This produces the executable myprogram. Stackoverflow has a nice post about the
g++ optimization flags, which includes the bit:
The rule of thumb:
When you need to debug, use -O0 (and -g to generate debugging symbols.)
When you are preparing to ship it, use -O2.
When you use gentoo, use -O3...!
When you need to put it on an embedded system, use -Os (optimize for size, not f
or efficiency.)
I usually use -O2:
$ g++ -O2 myprogram.cpp -o myprogram
Read about GCC, the GNU Compiler Collection, here.
xargs TOP VIEW_AS_PAGE
xargs is a nice shortcut to avoid using for loops. For example, let s say we hav
e a bunch of .txt files in a folder and they re symbolic links we want to read.
The following two commands are equivalent:
$ for i in *.txt; do readlink -m $i; done
$ ls *.txt | xargs -i readlink -m {}
In xargs world, the {} represents "the bucket"i.e., what was passed through the p
ipe.
Here are some more examples. Make symbolic links en masse:
$ ls /some/path/*.txt | xargs -i ln -s {}
grep a bunch of stuff en masse:
$ # search for anything in myfile.txt in anotherfile.txt
be a command (a
a bash builtin,
you re dealing
I ve aliased qu
built-in comman
The info command has a wealth of information about, well, nearly everything:
$ info
TTATGCGATAAACCCGGGTGTAATTTTATTTTTTT
Here s the answer:
$ cat myfasta.fa | grep -v ">" | fold -w 1 | sort | uniq -c
17 A
13 C
18 G
22 T
rev TOP VIEW_AS_PAGE
rev reverses a string:
$ echo hello | rev
olleh
As noted in the bioinformatics wiki, we can use this trick to find the reverse c
omplement of a string representing DNA nucleotides:
$ echo CACTTGCGGG | rev | tr ATGC TACG
CCCGCAAGTG
mount TOP VIEW_AS_PAGE
mount, as the docs say, mounts a filesystem. Imagine two directory trees, one on
your computer and one on an external hard drive. How do you graft these trees t
ogether? This is where mount comes in. On your Macintosh, external hard drives g
et mounted by default in the
/Volumes
directory of your computer. Put differently, the directory tree of your computer
and the directory tree of your HD are fused there. I ve only ever used the actu
al mount command in the context of mounting an EBS volume on an EC2 node on Amaz
on Web Services.
mktemp TOP VIEW_AS_PAGE
mktemp will make a temporary directory with a unique name in your designated tem
porary directory. Recall that, usually, this is the directory:
/tmp
However, you can set it to be anything you like using the variable TMPDIR. Let s
see mktemp in action:
$ echo $TMPDIR
/path/tempdir
$ mktemp
/path/tempdir/tmp.LaOYagQwq3
$ mktemp
/path/tempdir/tmp.h5FapGH4LS
mktemp spits out the name of the directory it creates.
watch TOP VIEW_AS_PAGE
If you prefix a command with watch, it will repeat every 2.0 seconds and you can
monitor it. For instance, if a file in /some/directory is growing, and you want
to monitor the directory s size:
$ watch ls -hl /some/directory
If a program is churning away and slowly writing to a log file, you can watch th
e tail:
$ watch tail log.txt
Another example is using watch in combination with the Oracle Grid Engine s qsta
t to monitor your job status:
$ watch qstat
perl, python TOP VIEW_AS_PAGE
Perl and Python aren t really unix commands, but whole massive programming langu
ages in themselves. Still, there are some neat things you can do with them on th
e command line. See:
An Introduction to Unix - Command Line Perl and Regex
10467
IN
IN
SOA
de, and build it on your computer (google the mantra ./configure; make; make ins
tall). However, you ll probably discover that your system has some crucial defic
iency. Something s out of date, plus the program you re trying to install requir
es all these other programs which you don t have. You ve entered what I call dep
endency hell, which has a way of sending you to obscure corners of the internet
to scour chat rooms for mysterious lore.
When possible, avoid this approach and instead use a package manager, which take
s care of installing a programand all of its dependenciesfor you. Which package ma
nager you use depends on which operating system you re running. For Macintosh, i
t s impossible to live without brew, whose homepage calls it, "The missing packa
ge manager for OS X." For Linux, it depends on your distribution s lineage: Ubun
tu has apt-get; Fedora has yum; and so on. All of these package managers can sea
rch for packagesi.e., see what s out thereas well as install them.
Let s use the program gpg2 (The GNU Privacy Guard), a famous data
l which implements the OpenPGP standard, to see what this process
rst, let s search:
$ brew search gpg2
# Mac
$ yum search gpg2
# Fedora
$ apt-cache search gpg2
# Ubuntu
Installing it might look like this:
$ brew install gpg2
# Mac
$ sudo yum install gnupg2.x86_64
# Fedora
$ sudo apt-get install gpgv2
# Ubuntu
The exact details may vary on your computer but, in any case, now
o wax dangerous and do some Snowden-style hacking! (He reportedly
.)
display, convert, identify TOP VIEW_AS_PAGE
encryption too
looks like. Fi
you re ready t
used this tool
Note: display, convert, and identify are not default shell programs. You have to
download and install them from ImageMagick, an awesome suite of command-line to
ols for manipulating images.
display is a neat command to display images right from your terminal via the X W
indow System:
$ display my_pic.png
identify gets properties about an image file:
$ identify my_pic.png
my_pic.png PNG 700x319 700x319+0+0 8-bit DirectClass 84.7kb
So, this is a 700 by 319 pixel image in PNG format and so on.
convert is a versatile command that can do about 8 gzillion things for you. For
example, resize an image:
$ convert my_pic.png -resize 550x550\> my_pic2.png
Add whitespace around the image file my_pic.png until its 600px by 600px and con
vert it into gif format:
$ convert my_pic.png -background white -gravity center -extent 600x600 my_pic.gi
f
Do the same thing, but turn it into jpg format and put it at the top of the new
image file (-gravity north):
$ convert my_pic.png -background white -gravity north -extent 600x600 my_pic.jpg
Change an image s width (in this case from 638 to 500 pixels) while preserving i
ts aspect ratio:
$ convert my_pic_638_345.png -resize 500x my_pic_500.png
As a common use case, the ImageMagick commands are invaluable for optimizing ima
ges you re using on your websites. See How to Make a Website - Working with and
Optimizing Images for Your Site for details.
gpg2 TOP VIEW_AS_PAGE
Note: gpg2 is not a default shell program. You have to download and install it.
gpg2, "a complete and free implementation of the OpenPGP standard," according to
the official docs, is a program for encrypting files. Here are the bare-bones b
asics.
The first thing to do is to generate a public/private key pair for yourself:
$ gpg2 --gen-key
The process will look something like this:
image
Since you re already going through the trouble, choose a 4096 bit key and a secu
re passphrase (you re going to need this passphrase in the future to decrypt stu
ff, so remember it). Provide your real name and email, which will be associated
with the key pair.
To encrypt a message (or file, or tarball) for somebody, you need to have his or
her public key. You encrypt with this recipient s public key, and only the reci
pient can decrypt with his or her private key. The first step in this process is
importing the recipient s public key into your keyring. If you want to practice
this, you can save my pubic key in a file called oli.pub and import it:
$ gpg2 --import oli.pub
List the public keys in your keyring:
$ gpg2 --list-keys
This will return something like:
pub 4096R/9R092F51 2014-03-20
uid
[ unknown] Jon Smith <[email protected]>
sub 4096R/320G5188 2014-03-20
You can also list your non-public keys:
$ gpg2 --list-secret-keys
Your keys are stored in the folder:
~/.gnupg
To encrypt the file tmp.txt for the person with uid (user id) Jon Smith (where J
on Smith s public key has already been imported into your keyring):
$ gpg2 --recipient "Jon Smith" --encrypt tmp.txt
This will create the file tmp.txt.gpg, which is in non-human-readable binary Ope
nPGP format. I recommend using the -a flag, which creates "ASCII armored output"i
.e., a file you can cat or paste into the body of an email:
$ gpg2 --recipient "Jon Smith" --encrypt -a tmp.txt
This will create the file tmp.txt.asc, which will look something like this:
-----BEGIN PGP MESSAGE----Version: GnuPG v2
hQIMA4P86BWIDBBVAQ/+KIfnVGBF0Ei/9G/a8btxHu1wXrOpxp77ofPLzJ47e3pm
Y4uO17sbR9JJ12gPHoc5wQT60De9JNtSsPbyD7HVUF0kD+mfnyM8UlyWd7P4BjE5
vRZLhlMt4R88ZvjsU5yTRmLFSkMTIl5NUXTiGuBfjkRZUL+1FooUKgrpu5osACy/
6/7FZ75jReR8g/HEYHody4t8mA3bB5uLoZ8ZEHluj6hf6HjI8nFivNO487IHMhz3
UnDeoStL4lrx/zU0Depv/QNb4FRvOWUQMR7Mf61RcFtcHqfDyjg3Sh5/zg5icAc6
/GEx/6fIuQtmVlXtDCuS17XjaTNBdBXgkBqxsDwk3G1Ilgl9Jo83kBkQLgo3FdB6
3qs9AafNxTzloba8nF38fp0LuvvtSyhfHpYIc4URD5Mpt5ojIHNBVxevY0OeRDX8
x6Bmqu1tKSsKJz7fb0z3K/4/dYTMspIMUIw64y6W3De5jJv9pkbaQ/T1+y5oqOeD
rNvkMYsOpMvHBrnf0liMgn+sLuKE+G26lQfGm4MdnFQmb7AWBC5eqK8p1MnSojGm
klTlTpRSKfx4vOhB4K64L2hlH0rBHE3CvOOsJivzZ7r7MKsBoX6ZHwxVR0YOh/5J
m0Fzk0iprP9vzv5bWlADpCWYVUp6I6WHDfaFnwCDxH2O6y+krxHGjHei7u7GV9fS
SgEVvZZDErHr/ZTwwa7Xld37cJ9dOpesBECrk5ncLr25mNzDNGxgDXqM2yEuzhNa
HDmO0dVloPnVuQ/2SYL/4JP4Z6Uitm13nKQK
=55in
-----END PGP MESSAGE----Conversely, if somebody wants to send you a secret message or file, they re goin
g to need your public key. You can export it and send it to them:
$ gpg2 --export -a "myusername" > mypublickey.txt
where myusername was the user name you chose during key generation. (There are a
lso keyservers, such as keyserver.ubuntu.com, which host public keys.)
If somebody sends you the encrypted file secret.txt.asc, you can decrypt it with
:
$ gpg2 --decrypt secret.txt.asc
Want to know more? Here are some good references:
Encrypting Files with gpg2
Zachary Voase: Openpgp for Complete Beginners
Alan Eliasen: GPG Tutorial
Ian Atkinson: Introduction to GnuPG
datamash TOP VIEW_AS_PAGE
Note: datamash is not a default shell program. You have to download and install
it.
GNU datamash is a great program for crunching through text files and collapsing
rows on a common ID or computing basic statistics. Here are some simple examples
of what it can do.
Collapse rows in one column based on a common ID in another column:
$ cat file.txt
3
d
2
w
3
c
4
x
1
a
$ cat file.txt | datamash -g 1 collapse 2 -s -W
1
a
2
w
3
d,c
4
x
The -g flag is the ID column; the collapse field picks the second column; the -s
flag pre-sorts the file; and the -W flag allows us to delimit on whitespace.
Average rows in one column on a common ID:
$ cat file.txt
A
1
3
SOME_OTHER_INFO
A
1
4
SOME_OTHER_INFO2
B
2
30
SOME_OTHER_INFO4
A
2
5
SOME_OTHER_INFO3
B
1
1
SOME_OTHER_INFO4
B
2
3
SOME_OTHER_INFO4
B
2
1
SOME_OTHER_INFO4
$ cat file.txt | datamash -s -g 1,2 mean 3 -f -s
A
1
3
SOME_OTHER_INFO
3.5
A
2
5
SOME_OTHER_INFO3
5
B
1
1
SOME_OTHER_INFO4
1
B
2
30
SOME_OTHER_INFO4
11.333333333333
In this case, the ID is the combination of columns one and two and the mean of c
olumn 3 is added as an additional column.
Simply sum a file of numbers:
$ cat file.txt | datamash sum 1
Hat tip: Albert
virtualenv TOP VIEW_AS_PAGE
Note: virtualenv is not a default shell program. You have to download and instal
l it.
Do you use Python? virtualenv is a special command line tool for Python users. W
e learned about package managers in the Introduction to Unix - Installing Progra
ms on the Command Line, and Python s is called pip. Suppose you re working on a
number of Python projects. One project has a number of dependencies and you ve u
sed pip to install them. Another project has a different set of dependencies, an
d so on. You could install all of your Python modules in your global copy of Pyt
hon, but that could get messy. It would be nice if you could associate your depe
ndencies with your particular project. This would also ensure that if two projec
ts have conflicting dependenciessay they depend on different versions of the same
moduleyou can get away with it. Moreover, it would allow you to freely install o
r update modules to your global Python worry-free, since this won t interfere wi
th your projects. This is what virtualenv does and why it s a boon to Python use
rs.
Following the docs, first install it:
$ sudo pip install virtualenv
To make a new Python installation in a folder called venv, run:
$ virtualenv venv
To emphasize the point, this is a whole new copy of Python. To use this Python,
type:
$ source venv/bin/activate
As a sanity check, examine which Python you re using:
(venv) $ which python
/some/path/venv/bin/python
It s virtualenv s copy! Now if you, say, install Django:
(venv) $ pip install Django
You can see that you only have the Django module (and wheel):
(venv) $ pip freeze
Django==1.8.7
wheel==0.24.0
Django s source code is going to be installed in a path such as:
venv/lib/python2.7/site-packages
In practice, if you were doing a Django project, everytime you wanted to start c
oding, the first order of business would be to turn on virtualenv and the last w
ould be to turn it off. To exit virtualenv, type:
(venv) $ deactivate
Bonus: Global Variables in Unix TOP VIEW_AS_PAGE
To see all global variables, type:
$ set
Where bash looks for commands and scripts:
$PATH
Add a directory path to the back of the path:
$ PATH=$PATH:/my/new/path
Add a directory path to the front of the path:
$ PATH=/my/new/path:$PATH
Now we know where to put bash commands, but what about other programs? What if y
ou ve installed a package or a module in a local directory and you want the prog
ram to have access to it? In that case, the following global variables come into
play.
Where matlab looks for commands and scripts:
$MATLABPATH
Where R looks for packages:
$R_LIBS
Where awk looks for commands and scripts:
$AWKPATH
Where Python looks for modules:
$PYTHONPATH
Where Cpp looks for libraries:
$LD_LIBRARY_PATH
Where Perl looks for modules:
$PERL5LIB
This will come into play when you ve locally installed a module and have to make
sure Perl sees it. Then you ll probably have to export it. E.g.:
$ export PERL5LIB=/some/path/lib64/perl5
Text editor:
$EDITOR
You can set this by typing, e.g.:
export EDITOR=/usr/bin/nano
Then Control-x-e will invoke the text editor.
The specially designated temporary directory:
$TMPDIR
A shell variable that stores a random number for you:
$RANDOM
For example:
$ echo $RANDOM
1284
$ echo $RANDOM
27837
Advertising
image
image
image