This document provides instructions for performing various data analysis tasks in Linux/Unix including:
1. Installing necessary tools like Bowtie2, Velvet, Samtools, and VarScan for alignment, assembly, variant calling, and other tasks.
2. Using these tools to align reads with Bowtie2, assemble transcripts with Velvet, convert formats and sort BAM files with Samtools, and call SNPs and indels with VarScan.
3. It also describes how to use tools like BAMView to view alignments.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
59 views3 pages
Linux Commands
This document provides instructions for performing various data analysis tasks in Linux/Unix including:
1. Installing necessary tools like Bowtie2, Velvet, Samtools, and VarScan for alignment, assembly, variant calling, and other tasks.
2. Using these tools to align reads with Bowtie2, assemble transcripts with Velvet, convert formats and sort BAM files with Samtools, and call SNPs and indels with VarScan.
3. It also describes how to use tools like BAMView to view alignments.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3
Data Analysis in Linux/Unix
#For window10 or above, go to start menu, SEARCH UBUNTU ON MICROSOFT STORE,
click on download/install Ubuntu (version 22.04). # After installing Ubuntu (version 22.04), open the app, provide a name and password (remember that, or store it at some safe place for future use). # After doing the above, update/upgrade the system by entering following commands on linux terminal. sudo apt update sudo apt upgrade # means comments, rest are all commands pwd #present working directory (the directory you’re in) ls #lists files and directories ls –a #shows hidden files ls –lh #shows details, such as file sizes cd Desktop #changes current directory to Desktop cd .. #Changes to one directory “higher” cd / #changes to “root” directory cd ~ #go to home directory mkdir workshop #makes a new directory named workshop rm workshop # removes directory if empty else use rm –r nano text #creates, views/edits contents of file “text” more text #see file “text” in screen-size chunks less text #see file with arrows to move about (type :q to quit) head text #see top of file “text” tail text #see bottom of file “text” cp text text1 #will copy contents of file text and save it as text1 cp text text #Try it, will curse you cp text test #creates copy of the file “text” in the dir test #If there is no directory called test, creates the file test with contents as of file test, If there is already a file called test: it will be automatically over-written! mv text test #Moves contents, except it deletes the original file rm text #Deletes the file “text”. Forever. gunzip xxx.gz #unzip a .gz compressed file unzip xxx.zip #unzip a .zip compressed file tar –xvf #un-tar a tar archive man tar # will open tar manual and helps about program “tar” top #see all system processes ssh #open a remote, secure connection ctrl-z #stop a program kill %1 #get rid of it for good grep this file.txt #finds lines matching “this” in file text ftp #get a remote file #How to run scripts/codes written in other languages like perl/python etc in linux python myscript.py # runs a python script. NGS Data Analysis in Linux # We need following Linux/Unix based tools for our NGS data analysis, we can get them one by one as; # 1: bowtie2 is a short read aligner. Used for Alignment of reads to transcripts/Gene Models or Genome, important in RNASeq and Variant calling sudo apt-get install bowtie2 # sudo will ask for password etc # Running bowtie2 bowtie2-build AT_Transcripts.txt AT_Indexes # mac users will add ./ in the beginning (./bowtie-) # will build indexes from the transcripts/Gene Models, that will help aligning of the reads quickly and efficiently bowtie2 -x AT_Indexes mu1.txt > mu1.output # mac users (./bowtie2 -x AT -----) # Will align the reads stored in file mu1.txt to the indexes we made in the above python3 getCounts_updated.py -in mu1.output # Counting the reads, aligned to various gene models # 2 Genome/Transcriptome Assembly. The tools are called assemblers, we will use one of the known as velvet sudo apt install velvet velveth 31mer 31 -fastq -short mu1.txt #mac users (./velveth----) #velveth is algorithm for hash/kmer generation in velvet program. #foldername can be any output folder name in which you store data. If this name is not given file will get stored in present directory. #31 is kmer length. Velvet comes with by-default maximum 31kmer length. This can be changed during installation. #For velvet you need to tell the program file format & type of input file. So that it can distinguish between different file formats. velvetg 31mer/ #mac users (./velvetg ---) #velvetg is algorithm for making contigs in velvet program. #min_contig_lgth this should be greater than kmer length. So that contigs can be made using overlaps. # 3 Variant Calling. We will align the files with aligners like bowtie, bwa or others and use the output file in sam format sudo apt install samtools samtools view -S -b mu1.output > mu1_output.bam # to convert sam file to bam, mac users don’t forget to add ./ samtools sort mu1_output.bam -o mu1_output.bam.sorted # will sort the bam file (./ for mac) samtools mpileup -f AT_Transcripts.txt mu1_output.bam.sorted > mu1_mplieup_results # (./ mac) #Call SNPs using varscan For this we will use chromosome 22 data, to have significant snps and indels bowtie2-build chr22.fasta chr22_indexes bowtie2 -x chr22_indexes chr22.fastq > chr22.output samtools view -S -b chr22.output > chr22.output.bam samtools sort chr22.output.bam -o chr22.output.bam.sorted samtools mpileup -f chr22.fasta chr22.output.bam.sorted > chr22.mplieup_results We need java to be installed on computers Java # will return you help if java is installed, else error message will be displayed sudo apt install default-jre # will install Java on ubuntu java -jar VarScan.v2.3.9.jar pileup2snp chr22.mplieup_results > chr22.snp.txt # SNP calling Call indels with varscan java -jar VarScan.v2.3.9.jar pileup2indel chr22.mpileup > chr22.idels.txt # INDELS calling ############################### Alignment viewer: BAMVIEW Setup and installation Bamview: java -mx512m -jar BamVi_v1.2.11.jar