0% found this document useful (0 votes)
41 views22 pages

Introduc) On To Linux, R, & PLINK: Kridsadakorn Chaichoompu

This document provides an introduction to Linux, R, and PLINK. It outlines the topics to be covered which include basic Linux, the R language, and PLINK. It then goes on to discuss why Linux is useful, especially for scientific research. It also provides instructions on how to access Linux locally or remotely, and lists many common Linux commands and their purposes. Finally, it begins discussing why R is a useful language.

Uploaded by

Irfan Hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views22 pages

Introduc) On To Linux, R, & PLINK: Kridsadakorn Chaichoompu

This document provides an introduction to Linux, R, and PLINK. It outlines the topics to be covered which include basic Linux, the R language, and PLINK. It then goes on to discuss why Linux is useful, especially for scientific research. It also provides instructions on how to access Linux locally or remotely, and lists many common Linux commands and their purposes. Finally, it begins discussing why R is a useful language.

Uploaded by

Irfan Hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Introduc)on

 to  Linux,  R,  &  PLINK  

Kridsadakorn  Chaichoompu  
Montefiore,  University  of  Liege  
[email protected]!

10/03/15   GBIO0015   1  
Outline  
•  Basic  Linux  (p.  3-­‐10)  
•  R  language  (p.  11-­‐16)  
•  PLINK  (p.  17-­‐19)  
•  Awareness  (p.  20-­‐21)  
•  Assignment  (p.  22)  

10/03/15   GBIO0015   2  
Basic  Linux:  Why  Linux?    
•  Linux  is  the  Unix-­‐based  system  as  like  Mac  OS,  
but  it  is  FREE!  
•  Linux  terminal  is  not  friendly  for  the  end  users  
•  Linux  is  very  powerful  and  useful  especially  for  
scien)fic  research  to  deal  with  large  text-­‐based  
data  files  
•  Linux,  Unix,  and  Mac  OS  provide  similar  
command  lines,  while  MS  Windows  doesn’t  
•  It  would  be  a  good  opportunity  to  have  fun  with  
Linux,  today!  
10/03/15   GBIO0015   3  
Basic  Linux:  Let’s  get  started  
•  If  you  have  Linux  installed  in  your  computer,  
you  can  jump  start  
•  If  you  use  Mac  OS,  you  can  jump  start  also  
•  If  you  use  MS  Windows,  you  need  to  use  
PuTTY  to  connect  to  compu)ng  server  
–  Download:  
 www.pucy.org  
–  Get:  PuTTY  
–  Get:  PSCP  or  PSFTP  

10/03/15   GBIO0015   4  
Basic  Linux:  How  to  start  
•  Locally:  open  the  terminal  in  Linux  or  Mac  OS  
•  Remotely:    
–  For  Linux  and  Mac:  use  “ssh”  to  connect  to  compu)ng  server  
–  For  Windows:  use  PuTTY  to  connect  to  compu)ng  server  
•  Compu)ng  servers:  ms801  –  ms825  at  montefiore.ulg.ac.be    
–  For  Linux  and  Mac  è  ssh  [email protected]fiore.ulg.ac.be  
–  For  PuTTY  è  set  a  connect  as  
•  Server:  ms801.montefiore.ulg.ac.be  
•  Protocal:  ssh  or  secure  shell  protocal  
•  Port:  22  
•  If  you  don’t  have  user  &  password  to  access  the  compu)ng  servers,  
please  send  me  an  email  with  your  ULg  email.  
•  Note:  contact  person  è  Marc  Frederic  
([email protected])  

10/03/15   GBIO0015   5  
Basic  Linux:  basic  commands  
Commands   Purposes  
ls   List  the  contents  of  directory.  Check:  -­‐l,  -­‐lt,  -­‐a  
pwd   Print  name  of  current  directory  
cd     Change  directory  
mkdir   Make  directory  
rmdir   Remove  empty  directories  
mv   Rename  or  move  files  
cp   Copy  files.  Check:  -­‐r  
ln   Create  symbolic  link  between  files.  It  is  useful  to  create  links  for  large  
files,  instead  of  making  copies  to  reduce  disk  space.  Check:  -­‐s  
date   Show  the  system  date  and  )me.  It  is  useful  to  make  a  )mestamp  for  log  
files  to  track  back  your  work.  
nohup   Run  command  as  background  process  without  display  on  the  screen  
*very  useful  command  for  long-­‐run  processes  
10/03/15   GBIO0015   6  
Basic  Linux:  basic  commands  (2)  
Commands   Purposes  
man   Show  manual  pages  of  Linux  commands  
find   Search  for  files.  Check:  -­‐name  
echo   Print  text    
>  and  >>   “>”  means  saving  all  lines  of  text  to  the  new  file,  instead  of  displaying  
on  the  screen.  “>>”  means  appending  text  to  the  exis)ng  file.  
cat   Concatenate  files  and  display  on  the  screen.  
cut   Cut  some  parts  of  files  and  display  on  the  screen.  Check:  -­‐f,  -­‐d  
|   A  symbol  for  joining  commands.  The  commands  behind  “|”  will  be  
executed  aner  the  previous  commands  
wc   Count  newline  and  words.  Check:  -­‐l  *very  useful  op)on  
grep   Print  the  matched  lines  of  search  pacern  
top   Check  the  running  processes  in  the  system.  Note:  press  “q”  to  exit  

10/03/15   GBIO0015   7  
Basic  Linux:  basic  commands  (3)  
Commands   Purposes  
more   Pint  the  file  content.  Note:  use  arrow  keys  to  scroll  
head   Print  the  first  part  of  file.  Check:  -­‐n  
tail   Print  the  last  part  of  file.  Check:  -­‐n  
paste   Combine  lines  of  files  

“Linux  commands  are  easy”  

10/03/15   GBIO0015   8  
Basic  Linux:  Try!  
•  echo "1 2 3 4 5 6 7" > test1.txt!
•  echo "a b c d e f g" > test2.txt!
•  cat test1.txt test2.txt!
•  cat test1.txt test2.txt > test3.txt!
•  echo "q w e r t y u" >> test3.txt!
•  more test3.txt!
•  head -n 1 test3.txt!
•  tail -n 1 test3.txt!
•  wc -l test3.txt!
•  date >> test3.txt!
•  more test3.txt!
•  cut -d' ' -f1 test3.txt > test4.txt!
•  cut -d' ' -f2 test3.txt > test5.txt!
10/03/15   GBIO0015   9  
Basic  Linux:  Try!  (2)  
•  more test4.txt test5.txt!
•  paste test5.txt test4.txt!
•  paste -d',' test5.txt test4.txt > test6.txt!
•  cat test6.txt!
•  cat test3.txt test6.txt > test7.txt!
•  grep Mon test7.txt!
•  cat test7.txt|cut -d’ ‘ -f1!
•  cat test7.txt|cut -d' ' -f1|grep Mon!
•  ls!
•  ln -s test3.txt test3_link.txt!
•  ls –l!
•  nohup ls -lt > log.txt!
•  cat log.txt!
10/03/15   GBIO0015   10  
R  language:  Why  R?    
•  R  is  FREE!  Available  for  all  Unix  plaporms,  Mac  
OS,  and  Windows.    
–  Download:  hcp://www.r-­‐project.org  
•  R  is  a  powerful  language  for  mathema)c  and  
sta)s)c  computa)on,  especially  for  matrix  
calcula)on  
•  R  has  a  big  community  to  help  in  developing  R  
packages  in  many  scien)fic  areas  
•  R  can  produce  nice  plots,  and  even  3D  plots  
10/03/15   GBIO0015   11  
R  language:  Let’s  get  started    
•  Locally:  recommend  to  use  Rstudio,  free  for  the  open  source  
edi)on  
–  Download:  hcp://www.rstudio.com  
–  Available  for  Linux,  Mac  OS,  and  Windows  
–  It  is  easier  to  install  R  packages  via  Rsudio  
–  It  is  easier  to  monitor  variables,  historical  commands,  and  view  plots  
•  Remotely:  R  was  already  installed  in  the  compu)ng  servers  è  
ms801  –  ms825  
–  Start  the  R  consol:  R --vanilla!
•  Say  hello  from  R  è  print(“hello”)!

Congratula)ons  You  are  now  friend  with  R!  


10/03/15   GBIO0015   12  
R  language:  Let’s  try  Matrix  Calcula)on  
•  Create  a  set  of  random  number  
–   a  <-­‐  rnorm(12)  
–  “<-­‐”  is  an  assignment  opera)on,  as  well  as  “=“  
•  Create  a  matrix  
–  m  =  matrix(a,nrow=3,ncol=4,  byrow=FALSE)    
–  To  avoid  ambiguousness,  suggest  to  use  “=“  
–  nrow  is  a  number  of  row,  ncol  is  a  number  of  column  
•  Create  another  matrix  
–  n  =  matrix(rnorm(12),nrow=3,ncol=4)  
–  We  can  combine  R  func)ons  as  like  the  example  è  rnorm()  will  be  execute  
before  calling  matrix()  
•  Try  mathema)c  operators  with  matrices  
–  m  +  n,  or  m  -­‐  n,  or  m  *  n,  or  m  /  n  
•  Transpose  of  matrix  
–  t(n)  
–  try  è  m  %*%  t(n)  and  compare  to  m  *  n  
•  To  see  help  pages  how  to  use  func)ons,  use  “?”  e.g.  ?rnorm  or  ?matrix  
10/03/15   GBIO0015   13  
R  language:  Ploung  
•  Use  plot()  to  create  a  simple  XY  plot  
–  plot(rnorm(10))  
•  In  the  compu)ng  servers,  we  need  to  save  plots  as  files  
and  transfer  to  a  local  computer  to  view  
–  pdf(file="./xyplot.pdf")  è  create  a  pdf  file  in  the  current  
working  directory,  use  getwd()  to  check  a  directory  and  
setwd()  to  change  a  directory  
–  plot(rnorm(10))  
–  points(rnorm(2),col="red")  è  add  2  red  dots  to  the  plot  
–  dev.off()  è  close  the  graphical  session,  all  graphical  
func)ons  called  before  dev.off()  will  be  saved  to  pdf  file  
•  R  also  supports  the  other  types  of  graphical  files  
–  Check:  jpeg(),  )ff(),  png(),  bmp()  
10/03/15   GBIO0015   14  
R  language:  Installing  R  packages  
•  In  RStudio:    
–  Go  to  “Packages”  tab,  click  “Install”  
–  “Install  Packages”  window  will  be  shown,  type  a  
package  name  in  “Packages”  
•  From  R  console:  
–  Use  install.packages(“pagkage  name”),  and  follow  the  
instruc)ons  
•  Try  to  install  “kinship2”  package  using  the  
explained  methods  
–  For  more  details,  check  
hcp://cran.r-­‐project.org/web/packages/kinship2/
index.html  
10/03/15   GBIO0015   15  
R  language:  Try!    
•  Here,  we  are  going  to  explore  “kinship2”  package  
–  data(sample.ped)!
–  sample.ped[1:5,]!
–  allfamily = pedigree(id=sample.ped$id,
dadid=sample.ped$father,
momid=sample.ped$mother, sex=sample.ped
$sex, famid=sample.ped$ped)!
–  pdf(file="./family1.pdf")!
–  plot(allfamily['1'])!
–  dev.off()!
To  be  con)nued    
in  next  class  
10/03/15   GBIO0015   16  
PLINK:  Why  PLINK?  
•  PLINK  is  a  whole  genome  associa)on  analysis  sonware,  and  it  is  
FREE!  
–  Download:  hcp://pngu.mgh.harvard.edu/~purcell/plink/  
•  PLINK  has  a  well-­‐documented  manual  to  explain  all  features  
•  PLINK  is  available  for  Linux,  Mac  OS,  and  MS-­‐DOS  
•  PLINK  has  2  versions,  the  stable  version  (1.07)  and  the  beta  version  
(1.9)  
–  PLINK  1.9  works  much  faster  than  1.07  
–  PLINK  1.9  has  many  new  features  
•  gPLINK  is  the  other  version  of  PLINK  that  provides  graphical  user  
interface.  Please  be  aware  that  using  PLINK  for  a  while  genome  
analysis  usually  takes  a  long  )me,  it  is  becer  to  use  a  command-­‐line  
version  
•  Recommend  to  use  PLINK  1.07  

10/03/15   GBIO0015   17  
PLINK:  Let’s  get  started  
•  It  is  becer  to  use  PLINK  in  the  Unix-­‐based  plaporm  to  avoid  
a  problem  with  incompa)ble  files  
•  To  install  PLINK  in  the  compu)ng  server,  use  gwet  to  
download  the  zipped  file,  then  use  unzip  to  decompress  the  
file  
–  wget  hcp://pngu.mgh.harvard.edu/~purcell/plink/dist/
plink-­‐1.07-­‐i686.zip  
–  unzip  plink-­‐1.07-­‐i686.zip    
•  In  plink-­‐1.07-­‐xxx.zip,  there  is  an  example  set  of  input  files  
which  is  a  good  point  to  explore  
–  test.map  contains  the  marker  informa)on  
–  test.ped  contains  genotype  data  and  sample  informa)on  
•  Check  what  are  inside  the  example  files!    
–  ./plink  -­‐-­‐file  test  
10/03/15   GBIO0015   18  
PLINK:  File  Formats  
•  PLINK  mainly  supports  3  types  of  formats  
1.  Standard  text  format  (PED  and  MAP)  To  read  PED  file,  use  -­‐-­‐
file  in  case  that  PED  file  and  MAP  file  have  the  same  name,  
unless  we  need  to  clearly  indicate  by  using  -­‐-­‐ped  and  -­‐-­‐map  
2.  Binary  format  (BED,  BIM,  and  FAM)  To  reformat  PED  file  to  
BED  file,  use  -­‐-­‐make-­‐bed.  Don’t  forget  to  use  -­‐-­‐out  to  indicate  
the  prefix  of  output  files  
•  ./plink  -­‐-­‐file  test  -­‐-­‐make-­‐bed  -­‐-­‐out  test_bin  
3.  Transposed  text  format  (TPED,  and  TFAM)  To  reformat  PED  
file  to  TPED  file,  use  -­‐-­‐transpose  -­‐-­‐recode  
•  ./plink  -­‐-­‐file  test  -­‐-­‐transpose  -­‐-­‐recode  -­‐-­‐out  test_tp  
Important  note!  We  need  to  indicate  which  type  of  format  
that  we  want  as  output  from  an  analysis,  unless  PLINK  will  
not  create  any  output  file.  

10/03/15   GBIO0015   19  
Awareness  
•  To  work  across  plaporms  between  Unix-­‐based  OS  and  Windows,  we  need  
to  realize  that  text  files  are  different  
–  In  Linux,  use  dos2unix  to  convert  text  files  from  Windows  to  Unix-­‐based  OS  
–  In  Linux,  use  unix2dos  to  convert  text  files  from  Unix-­‐based  OS  to  Windows    
•  To  run  an  analysis  with  PLINK  for  whole  genome  data,  it  may  take  many  
hours.  Recommend  to  use  the  compu)ng  servers  and  run  as  background  
process  using  nohup  command  in  Linux  
•  It  is  nice  to  make  a  note  of  all  commands  that  we  use  in  our  analysis  
because  command  lines  are  complicated  and  are  easy  to  forget.  The  well-­‐
documented  note  can  help  to  track  back  if  there  are  something  wrong  
with  results.  
•  Always  use  the  absolute  paths  of  files  or  directories  as  parameters  of  
command  lines  or  func)ons.  At  least  to  avoid  a  problem  when  we  have  
the  files  with  the  same  name,  but  in  different  directories.  For  examples:  
–  /analysis1/inpupile.ped  
–  /analysis2/inpupile.ped  
We  might  run  an  analysis  with  the  wrong  input  file  if  we  forget  to  change  the  
working  directory  
10/03/15   GBIO0015   20  
Awareness  (2)  
•  It  is  nice  to  have  log  files  with  )mestamps.  We  can  use  to  track  back  the  
whole  process  and  to  es)mate  run)me  for  further  analysis.  Try  the  below  
example  for  combining  command  lines.  Note  that  you  can  use  text  editors  
such  as  vi,  vim,  and  nano  to  create  a  script  
–  cmdpath=~/plink-1.07-i686 #Path of command!
–  datapath=~/plink-1.07-i686 #Path of data files!
–  myscript=~/run_plink.sh!
–  echo 'echo "Started at: `date` " ' > ${myscript}!
–  echo "${cmdpath}/plink --file ${datapath}/test --make-
bed --out ${datapath}/test_bed" >> ${myscript}!
–  echo 'echo "Ended at: `date` " ' >> ${myscript}!
–  echo "Created: ${myscript}"!
!
–  cat ~/run_plink.sh #To see what are in runscript.sh!
–  nohup sh ~/run_plink.sh > ~/run_plink.log!
–  cat ~/run_plink.log #To see what is inside!
–  head -n 1 ~/run_plink.log #To see the first line!
–  tail -n 1 ~/run_plink.log #To see the last line  
10/03/15   GBIO0015   21  
Assignment  1  
•  Summarize  sonware  tools:  where  do  they  focus  on?  Can  you  
classify  them?  What  are  the  criteria  to  classify  them?    
–  Check:  hcp://www.jurgoc.org/linkage/ListSonware.html  
•  Check  out  these  tools:  
–  PLINK  hcp://pngu.mgh.harvard.edu/~purcell/plink/  
–  FBAT  hcp://www.hsph.harvard.edu/|at/|at.htm  
–  GenABEL  hcp://www.genabel.org/  
What  is  the  philosophy  behind?  What  are  the  main  technical  differences?  
Which  study  designs  can  they  accommodate?  Use  available  informa)on  
on  their  website.  
•  The  summary  of  this  assignment  will  be  discussed  in  the  next  class  
(31  March  2015).  It  also  needs  to  be  incorporated  with  a  final  
report  and  later  it  will  be  marked.  
•  Due  date:  Slide  presenta)on  (21  April  2015)  

10/03/15   GBIO0015   22  

You might also like