LINUX Essentials
LINUX Essentials
Index
1. INTRODUCTION
2. BASICS
3. UNIX HELP
4. FINDING THINGS
5. PERMISSIONS & OWNERSHIP
6. USEFUL COMMANDS
7. JOB/PROCESS MANAGEMENT
8. TEXT VIEWING
9. TEXT EDITORS
10. THE UNIX SHELL
11. SIMPLE SHELL ONE-LINER SCRIPTS
12. SIMPLE PERL ONE-LINER SCRIPTS
13. REMOTE COPY
14. ARCHIVING AND COMPRESSING
15. SIMPLE INSTALLS
16. DEVICES
17. ENVIRONMENT VARIABLES
18. EXERCISES
1. INTRODUCTION
Why UNIX?
Multitasking
Remote tasking ("real networking")
Multiuser
Access to shell, programming languages, databases, open-source projects
Better performance, less expensive (free), more up-to-date
Many more reasons
UNIX variants
UNIX: Solaris, IRIX, HP-UX, Tru64-UNIX, Free's, LINUX, ...
LINUX distributions
RedHat, Debian, Mandrake, Caldera, Slackware, SuSE, ...
2. BASICS
Changing password:
$ passwd # follow instructions
Orientation
$ pwd # present working directory
$ ls # content of pwd
$ ll # similar as ls, but provides additional info on files and directories
$ ll -a # includes hidden files (.name) as well
$ ll -R # lists subdirectories recursively
$ ll -t # lists files in chronological order
$ stat <file_name> # provides all attributes of a file
$ whoami # shows as who you are logged in
$ hostname # shows on which machine you are
Handy shortcuts
$ . # refers to pwd
$ ~/ # refers to user's home directory
$ history # shows all commands you have used recently
$ !<number> # starts an old command by providing its ID number
$ up(down)_key # scrolls through command history
$ <incomplete path/file_name> TAB # completes path/file_name
$ <incomplete command> SHIFT&TAB # completes command
$ Ctrl a # cursor to beginning of command line
$ Ctrl e # cursor to end of command line
$ Ctrl d # delete character under cursor
$ Ctrl k # delete line from cursor, content goes into kill buffer
$ Ctrl y # paste content from Ctrl k
3. UNIX HELP
4. FINDING THINGS
Change ownership
$ chown <user> <file or dir> # changes user ownership
$ chgrp <group> <file or dir> # changes group ownership
$ chown <user>:<group> <file or dir> # changes user & group ownership
$ df # disk space
$ free # memory info
$ uname -a # shows tech info about machine
$ bc # command-line calculator (to exit type 'quit')
$ wget ftp://ftp.ncbi.nih.... # file download from web
$ /sbin/ifconfig # give IP and other network info
$ ln -s original_filename new_filename # creates symbolic link to file or directory
$ du -sh # displays disk space usage of current directory
$ du -sh * # displays disk space usage of individual files/directories
$ du -s * | sort -nr # shows disk space used by different directories/files sorted by size
7. JOB/PROCESS MANAGEMENT
8. TEXT VIEWING
$ less <my_file> # more versatile text viewer than 'more', 'G' moves to end of text, 'g' to beginning, '/' find
forward, '?' find backwards
$ more <my_file> # views text, use space bar to browse, hit 'q' to exit
$ cat <my_file> # concatenates files and prints content to standard output
9. TEXT EDITORS
VI and VIM
Non-graphical (terminal-based) editor. Vi is guaranteed to be available on any system. Vim is the improved
version of vi.
EMACS
Window-based editor. You still need to know keystroke commands to use it. Installed on all Linux
distributions and on most other Unix systems.
XEMACS
More sophisticated version of emacs, but usually not installed by default. All common commands are
available from menus. Very powerful editor, with built-in syntax checking, Web-browsing, news-reading,
manual-page browsing, etc.
PICO
Simple terminal-based editor available on most versions of Unix. Uses keystroke commands, but they are
listed in logical fashion at bottom of screen.
BASICS
$ vim my_file_name # open/create file with vim
$ i # INSERT MODE
$ ESC # NORMAL (NON-EDITING) MODE
$ : # commands start with ':'
$ :w # save command; if you are in editing mode you have to hit ESC first!!
$ :q # quit file, don't save
$ :q! # exits WITHOUT saving any changes you have made
$ :wq # save and quit
$ R # replace MODE
$ r # replace only one character under cursor
$ q: # history of commands (from NORMAL MODE!), to reexecute one of them, select and hit enter!
$ :w new_filename # saves into new file
$ :#,#w new_filename # saves specific lines (#,#) to new file
$ :# go to specified line number
HELP
$ Useful list of vim commands: Vim Commands Cheat Sheet, VimCard, Vim Basics
$ vimtutor # open vim tutorial from shell
$ :help # opens help within vim, hit :q to get back to your file
$ :help <topic> # opens help on specified topic
$ |help_topic| CTRL-] # when you are in help this command opens help topic specified between |...|,
CTRL-t brings you back to last topic
$ :help <topic> CTRL-D # gives list of help topics that contain key word
$ : <up-down keys> # like in shell you get recent commands!!!!
DISPLAY
WRAPPING AND LINE NUMBERS
$ :set nowrap # no word wrapping, :set wrap # back to wrapping
$ :set number # shows line numbers, :set nonumber # back to no-number mode
PRINTING FILE
$ :ha # prints entire file
$ :#,#ha # prints specified lines: #,#
MERGING/INSERTING FILES
$ :r <filename> # inserts content of specified file after cursor
UNDO/REDO
$ u # undo last command
$ U # undo all changes on current line
$ CTRL-R # redo one change which was undone
PUT (PASTE)
$ p # uses what was deleted/cut and pastes it behind cursor
COPY & PASTE
$ yy # copies line, for copying several lines do 2yy, 3yy and so on
$ p # pastes clipboard behind cursor
HTML EDITING
-Convert text file to html format:
$ :runtime! syntax/2html.vim # run this command with open file in Vim
When you log into UNIX/LINUX the system starts a program called SHELL. It provides you with a working
environment and interface to the operating system. Usually there are many different shell programs installed.
$ finger <user_name> # shows which shell you are using
$ chsh -l # gives list of shell programs available on your system (does not work on all UNIX variants)
$ <shell_name> # switches to different shell
STDIN, STDOUT, STDERR, REDIRECTORS, OPERATORS & WILDCARDS (more on this @ LINUX HOWTOs)
By default, many UNIX commands read from standard input (STDIN) and send their output to standard out
(STDOUT). You can redirect them by using the following commands:
$ file* # * is wildcard to specify many files
$ ls > file # prints ls output into specified file
$ command < my_file # uses file after '<' as STDIN
$ command >> my_file # appends output of one command to file
$ command | tee my_file # writes STDOUT to file and prints it to screen; alternative way to do this:
$ command > my_file; cat my_file
$ command > /dev/null # turns off progress info of applications by redirecting their output to /dev/null
$ grep my_pattern my_file | wc # Pipes (|) output of 'grep' into 'wc'
$ grep my_pattern my_non_existing_file 2 > my_stderr # prints STDERR to file
Useful One-Liners
$ perl -p -i -w -e 's/pattern1/pattern2/g' input_file # replace something (e.g. return
signs) in file using regular expressions; use $1 to backreference to pattern placed in parentheses
'-p' lets perl know to write program; '-i.bak' creates backup file *.bak, only -i doesn't; '-w' turns on warnings; '-e'
executeable code follows
$ perl -ne 'print if (/my_pattern1/ ? ($c=1) : (--$c > 0)) ; print if
(/my_pattern2/ ? ($d = 1) : (--$d > 0))' my_infile > my_outfile # parses lines that
contain pattern1 and pattern2
following lines after pattern can be specified in '$c=1' and '$d=1'; for OR function use this syntax:
'/(pattern1|pattern2)/'
Examples
Copy file from Server to Local Machine (type from local machine prompt):
$ scp user@remote_host:file.name . # '.' copies to pwd, you can specify here any directory, use
wildcards to copy many files at once.
Copy file from Local Machine to Server:
$ scp file.name user@remote_host:~/dir/newfile.name
Copy entire directory from Server to Local Machine (type from local machine prompt):
$ scp -r user@remote_host:directory/ ~/dir
Copy entire directory from Local Machine to Server (type from local machine prompt):
$ scp -r directory/ user@remote_host:directory/
Copy between two remote hosts (e.g. from bioinfo to cache):
similar as above, just be logged in one of the remote hosts:
$ scp -r directory/ user@remote_host:directory/
NICE FTP
$ open ncftp
$ ncftp> open ftp.ncbi.nih.gov
$ ncftp> cd /blast/executables
$ ncftp> get blast.linux.tar.Z (skip extension: @)
$ ncftp> bye
Viewing Archives
$ tar -tvf my_file.tar
$ tar -tzvf my_file.tgz
Extracting
$ tar -xvf my_file.tar
$ tar -xzvf my_file.tgz
$ gunzip my_file.tar.gz # or unzip my_file.zip, uncompress my_file.Z, or bunzip2 for file.tar.bz2
$ find -name '*.zip' | xargs -n 1 unzip # this command usually works for unziping many files
that were compressed under Windows
try also:
$ tar zxf blast.linux.tar.Z
$ tar xvzf file.tgz
options:
f: use archive file
p: preserve permissions
v: list files processed
x: exclude files listed in FILE
z: filter the archive through gzip
Systems-wide installations
Installations for systems-wide usage are the responsibility of system administrator
To find out if an application is installed, type:
$ which <application_name>
$ whereis <application_name> # searches for executeables in set of directories, doesn't depend on
your path
Most applications are installed in /usr/local/bin or /usr/bin. You need root permissions to write to these
directories.
Perl scripts go into /usr/local/bin, Perl modules (*.pm) into /usr/local/share/perl/5.8.0/. To copy executables
in one batch, use command: cp `find -perm -111 -type f` /usr/local/bin
Intstallation of RPMs
$ rpm -i application_name.rpm
To check which version of RPM package is installed, type:
$ rpm --query <package_name>
Help and upgrade files for RPMs can be found at http://rpmfind.net/.
16. DEVICES
Mount/unmount usb/floppy/cdrom
$ mount /media/usb
$ umount /media/usb
$ mount /media/cdrom
$ eject /media/cdrom
$ mount /media/floppy
18. EXERCISES
Exercise 1
a. Download proteome of Halobacterium spec. from
ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Halobacterium_sp/AE004437.faa (use wget or web
browser for download)
b. How many predicted proteins are there?
$ grep '>' AE004437.faa | wc
c. How many proteins contain the pattern "WxHxxH[1-2]"?
$ egrep 'W.H..H{1,2}' AE004437.faa | wc
d. Use the find function (/) in 'less' to fish out the proteins containing this pattern or more elegantly do it with
awk:
$ awk --posix -v RS='>' '/W.H..(H){1,2}/ { print ">" $0;}' AE004437.faa |
less
e. Create a BLASTable database with formatdb
$ formatdb -i AE004437.faa -p T -o T
'-p F' for nucleotide and '-p T' for protein databases
f. Generate list of sequence IDs for above pattern match result and retrieve its sequences with fastacmd from
formatted database
$ fastacmd -d AE004437.faa -i my_IDs > seq
g. Generate several lists of sequence IDs from various pattern match results and retrieve their sequences in one
step using the fastacmd in for loop
$ for i in *.my_ids; do fastacmd -d AE004437.faa -i $i > $i.out; done
h. Run blastall with a few proteins against newly created database or against Halobacterium or UniProt
database (/data/UNIPROT/blast/uniprot)
$ blastall -p blastp -i input.file -d AE004437.faa -o blastp.out -e 1e-6 -v
10 -b 10 &
i. Parse blastall output into Excel spread sheet:
a) using biocore parser
$ blastParse -c <hits> -i <blast.out> -o <blast.parse>
b) using BioPerl parser
$ bioblastParse.pl blast.out
j. Run HMMPFAM search with above proteins against Pfam database
$ hmmpfam -E 0.1 --acc -A0 /data/PFAM/Pfam_ls input.file > output.pfam
Parse result with BioPerl parser
$ hmmSummary output.pfam > hmm.summary
Exercise 2
a. Split sample fasta batch file with csplit (use sequence file from exercise 1).
b. Concatenate single fasta files from (1) to one batch file.
c. BLAST two related sequences, retrieve the result in table format and use join to identify common hit IDs in
the two tables.
Exercise 3
a. write a shell script that executes several BLAST searches at once:
#!/bin/sh
blastall -p blastp -d /.../my_database -i /.../my_input -o my_out -e 1e-6 -v 10 -b 10 &
blastall -p blastp -d /.../my_database -i /.../my_input -o my_out -e 1e-6 -v 10 -b 10 &
Exercise 4
a. Create multiple alignment with ClustalW (e.g. use sequences with 'W.H..HH' pattern)
$ clustalw my_fasta_batch
Exercise 5
a. Reformat alignment into PHYILIP format using 'seqret' from EMBOSS
$ seqret clustal::my_align.aln phylip::my_align.phylip
Exercise 6
a. Create neighbor-joining tree with PHYLIP
$ cp my_align.phylip infile
$ phylip protdist # creates distance matrix
$ cp outfile infile
$ phylip neighbor # use default settings
$ cp outtree intree
$ phylip retree # displays tree and can use midpoint method for defining root of tree, my typical
command sequence is: 'N' 'Y' 'M' 'W' 'R' 'R' 'X'
$ cp outtree my_tree.dnd
View your tree in TreeBrowse or open it in TreeView