Introduc) On To Linux, R, & PLINK: Kridsadakorn Chaichoompu
Introduc) On To Linux, R, & PLINK: Kridsadakorn Chaichoompu
Kridsadakorn
Chaichoompu
Montefiore,
University
of
Liege
[email protected]!
10/03/15
GBIO0015
1
Outline
• Basic
Linux
(p.
3-‐10)
• R
language
(p.
11-‐16)
• PLINK
(p.
17-‐19)
• Awareness
(p.
20-‐21)
• Assignment
(p.
22)
10/03/15
GBIO0015
2
Basic
Linux:
Why
Linux?
• Linux
is
the
Unix-‐based
system
as
like
Mac
OS,
but
it
is
FREE!
• Linux
terminal
is
not
friendly
for
the
end
users
• Linux
is
very
powerful
and
useful
especially
for
scien)fic
research
to
deal
with
large
text-‐based
data
files
• Linux,
Unix,
and
Mac
OS
provide
similar
command
lines,
while
MS
Windows
doesn’t
• It
would
be
a
good
opportunity
to
have
fun
with
Linux,
today!
10/03/15
GBIO0015
3
Basic
Linux:
Let’s
get
started
• If
you
have
Linux
installed
in
your
computer,
you
can
jump
start
• If
you
use
Mac
OS,
you
can
jump
start
also
• If
you
use
MS
Windows,
you
need
to
use
PuTTY
to
connect
to
compu)ng
server
– Download:
www.pucy.org
– Get:
PuTTY
– Get:
PSCP
or
PSFTP
10/03/15
GBIO0015
4
Basic
Linux:
How
to
start
• Locally:
open
the
terminal
in
Linux
or
Mac
OS
• Remotely:
– For
Linux
and
Mac:
use
“ssh”
to
connect
to
compu)ng
server
– For
Windows:
use
PuTTY
to
connect
to
compu)ng
server
• Compu)ng
servers:
ms801
–
ms825
at
montefiore.ulg.ac.be
– For
Linux
and
Mac
è
ssh
[email protected]fiore.ulg.ac.be
– For
PuTTY
è
set
a
connect
as
• Server:
ms801.montefiore.ulg.ac.be
• Protocal:
ssh
or
secure
shell
protocal
• Port:
22
• If
you
don’t
have
user
&
password
to
access
the
compu)ng
servers,
please
send
me
an
email
with
your
ULg
email.
• Note:
contact
person
è
Marc
Frederic
([email protected])
10/03/15
GBIO0015
5
Basic
Linux:
basic
commands
Commands
Purposes
ls
List
the
contents
of
directory.
Check:
-‐l,
-‐lt,
-‐a
pwd
Print
name
of
current
directory
cd
Change
directory
mkdir
Make
directory
rmdir
Remove
empty
directories
mv
Rename
or
move
files
cp
Copy
files.
Check:
-‐r
ln
Create
symbolic
link
between
files.
It
is
useful
to
create
links
for
large
files,
instead
of
making
copies
to
reduce
disk
space.
Check:
-‐s
date
Show
the
system
date
and
)me.
It
is
useful
to
make
a
)mestamp
for
log
files
to
track
back
your
work.
nohup
Run
command
as
background
process
without
display
on
the
screen
*very
useful
command
for
long-‐run
processes
10/03/15
GBIO0015
6
Basic
Linux:
basic
commands
(2)
Commands
Purposes
man
Show
manual
pages
of
Linux
commands
find
Search
for
files.
Check:
-‐name
echo
Print
text
>
and
>>
“>”
means
saving
all
lines
of
text
to
the
new
file,
instead
of
displaying
on
the
screen.
“>>”
means
appending
text
to
the
exis)ng
file.
cat
Concatenate
files
and
display
on
the
screen.
cut
Cut
some
parts
of
files
and
display
on
the
screen.
Check:
-‐f,
-‐d
|
A
symbol
for
joining
commands.
The
commands
behind
“|”
will
be
executed
aner
the
previous
commands
wc
Count
newline
and
words.
Check:
-‐l
*very
useful
op)on
grep
Print
the
matched
lines
of
search
pacern
top
Check
the
running
processes
in
the
system.
Note:
press
“q”
to
exit
10/03/15
GBIO0015
7
Basic
Linux:
basic
commands
(3)
Commands
Purposes
more
Pint
the
file
content.
Note:
use
arrow
keys
to
scroll
head
Print
the
first
part
of
file.
Check:
-‐n
tail
Print
the
last
part
of
file.
Check:
-‐n
paste
Combine
lines
of
files
10/03/15
GBIO0015
8
Basic
Linux:
Try!
• echo "1 2 3 4 5 6 7" > test1.txt!
• echo "a b c d e f g" > test2.txt!
• cat test1.txt test2.txt!
• cat test1.txt test2.txt > test3.txt!
• echo "q w e r t y u" >> test3.txt!
• more test3.txt!
• head -n 1 test3.txt!
• tail -n 1 test3.txt!
• wc -l test3.txt!
• date >> test3.txt!
• more test3.txt!
• cut -d' ' -f1 test3.txt > test4.txt!
• cut -d' ' -f2 test3.txt > test5.txt!
10/03/15
GBIO0015
9
Basic
Linux:
Try!
(2)
• more test4.txt test5.txt!
• paste test5.txt test4.txt!
• paste -d',' test5.txt test4.txt > test6.txt!
• cat test6.txt!
• cat test3.txt test6.txt > test7.txt!
• grep Mon test7.txt!
• cat test7.txt|cut -d’ ‘ -f1!
• cat test7.txt|cut -d' ' -f1|grep Mon!
• ls!
• ln -s test3.txt test3_link.txt!
• ls –l!
• nohup ls -lt > log.txt!
• cat log.txt!
10/03/15
GBIO0015
10
R
language:
Why
R?
• R
is
FREE!
Available
for
all
Unix
plaporms,
Mac
OS,
and
Windows.
– Download:
hcp://www.r-‐project.org
• R
is
a
powerful
language
for
mathema)c
and
sta)s)c
computa)on,
especially
for
matrix
calcula)on
• R
has
a
big
community
to
help
in
developing
R
packages
in
many
scien)fic
areas
• R
can
produce
nice
plots,
and
even
3D
plots
10/03/15
GBIO0015
11
R
language:
Let’s
get
started
• Locally:
recommend
to
use
Rstudio,
free
for
the
open
source
edi)on
– Download:
hcp://www.rstudio.com
– Available
for
Linux,
Mac
OS,
and
Windows
– It
is
easier
to
install
R
packages
via
Rsudio
– It
is
easier
to
monitor
variables,
historical
commands,
and
view
plots
• Remotely:
R
was
already
installed
in
the
compu)ng
servers
è
ms801
–
ms825
– Start
the
R
consol:
R --vanilla!
• Say
hello
from
R
è
print(“hello”)!
10/03/15
GBIO0015
17
PLINK:
Let’s
get
started
• It
is
becer
to
use
PLINK
in
the
Unix-‐based
plaporm
to
avoid
a
problem
with
incompa)ble
files
• To
install
PLINK
in
the
compu)ng
server,
use
gwet
to
download
the
zipped
file,
then
use
unzip
to
decompress
the
file
– wget
hcp://pngu.mgh.harvard.edu/~purcell/plink/dist/
plink-‐1.07-‐i686.zip
– unzip
plink-‐1.07-‐i686.zip
• In
plink-‐1.07-‐xxx.zip,
there
is
an
example
set
of
input
files
which
is
a
good
point
to
explore
– test.map
contains
the
marker
informa)on
– test.ped
contains
genotype
data
and
sample
informa)on
• Check
what
are
inside
the
example
files!
– ./plink
-‐-‐file
test
10/03/15
GBIO0015
18
PLINK:
File
Formats
• PLINK
mainly
supports
3
types
of
formats
1. Standard
text
format
(PED
and
MAP)
To
read
PED
file,
use
-‐-‐
file
in
case
that
PED
file
and
MAP
file
have
the
same
name,
unless
we
need
to
clearly
indicate
by
using
-‐-‐ped
and
-‐-‐map
2. Binary
format
(BED,
BIM,
and
FAM)
To
reformat
PED
file
to
BED
file,
use
-‐-‐make-‐bed.
Don’t
forget
to
use
-‐-‐out
to
indicate
the
prefix
of
output
files
• ./plink
-‐-‐file
test
-‐-‐make-‐bed
-‐-‐out
test_bin
3. Transposed
text
format
(TPED,
and
TFAM)
To
reformat
PED
file
to
TPED
file,
use
-‐-‐transpose
-‐-‐recode
• ./plink
-‐-‐file
test
-‐-‐transpose
-‐-‐recode
-‐-‐out
test_tp
Important
note!
We
need
to
indicate
which
type
of
format
that
we
want
as
output
from
an
analysis,
unless
PLINK
will
not
create
any
output
file.
10/03/15
GBIO0015
19
Awareness
• To
work
across
plaporms
between
Unix-‐based
OS
and
Windows,
we
need
to
realize
that
text
files
are
different
– In
Linux,
use
dos2unix
to
convert
text
files
from
Windows
to
Unix-‐based
OS
– In
Linux,
use
unix2dos
to
convert
text
files
from
Unix-‐based
OS
to
Windows
• To
run
an
analysis
with
PLINK
for
whole
genome
data,
it
may
take
many
hours.
Recommend
to
use
the
compu)ng
servers
and
run
as
background
process
using
nohup
command
in
Linux
• It
is
nice
to
make
a
note
of
all
commands
that
we
use
in
our
analysis
because
command
lines
are
complicated
and
are
easy
to
forget.
The
well-‐
documented
note
can
help
to
track
back
if
there
are
something
wrong
with
results.
• Always
use
the
absolute
paths
of
files
or
directories
as
parameters
of
command
lines
or
func)ons.
At
least
to
avoid
a
problem
when
we
have
the
files
with
the
same
name,
but
in
different
directories.
For
examples:
– /analysis1/inpupile.ped
– /analysis2/inpupile.ped
We
might
run
an
analysis
with
the
wrong
input
file
if
we
forget
to
change
the
working
directory
10/03/15
GBIO0015
20
Awareness
(2)
• It
is
nice
to
have
log
files
with
)mestamps.
We
can
use
to
track
back
the
whole
process
and
to
es)mate
run)me
for
further
analysis.
Try
the
below
example
for
combining
command
lines.
Note
that
you
can
use
text
editors
such
as
vi,
vim,
and
nano
to
create
a
script
– cmdpath=~/plink-1.07-i686 #Path of command!
– datapath=~/plink-1.07-i686 #Path of data files!
– myscript=~/run_plink.sh!
– echo 'echo "Started at: `date` " ' > ${myscript}!
– echo "${cmdpath}/plink --file ${datapath}/test --make-
bed --out ${datapath}/test_bed" >> ${myscript}!
– echo 'echo "Ended at: `date` " ' >> ${myscript}!
– echo "Created: ${myscript}"!
!
– cat ~/run_plink.sh #To see what are in runscript.sh!
– nohup sh ~/run_plink.sh > ~/run_plink.log!
– cat ~/run_plink.log #To see what is inside!
– head -n 1 ~/run_plink.log #To see the first line!
– tail -n 1 ~/run_plink.log #To see the last line
10/03/15
GBIO0015
21
Assignment
1
• Summarize
sonware
tools:
where
do
they
focus
on?
Can
you
classify
them?
What
are
the
criteria
to
classify
them?
– Check:
hcp://www.jurgoc.org/linkage/ListSonware.html
• Check
out
these
tools:
– PLINK
hcp://pngu.mgh.harvard.edu/~purcell/plink/
– FBAT
hcp://www.hsph.harvard.edu/|at/|at.htm
– GenABEL
hcp://www.genabel.org/
What
is
the
philosophy
behind?
What
are
the
main
technical
differences?
Which
study
designs
can
they
accommodate?
Use
available
informa)on
on
their
website.
• The
summary
of
this
assignment
will
be
discussed
in
the
next
class
(31
March
2015).
It
also
needs
to
be
incorporated
with
a
final
report
and
later
it
will
be
marked.
• Due
date:
Slide
presenta)on
(21
April
2015)
10/03/15 GBIO0015 22