diffHic preparePairs() does not identify paired-end reads
1
0
Entering edit mode
ilyco • 0
@ilyco-9716
Last seen 7.0 years ago

Hi,

I am trying to run diffHic on a single chromosome. I extracted the chromosome 1 alignment data and used the bam file as an input for diffHic. However, at the preparePairs() stage, none of the reads appear as paired.

This is the output from diffHic:

diagnostics <- preparePairs("chr1.bam", hs.param, file="chr1.h5", dedup=TRUE, minq=10)
> diagnostics
$pairs
   total   marked filtered   mapped 
       1        0        0        1 
$same.id
   dangling self.circle 
          0           0 
$singles
[1] 20248727
$chimeras
  total  mapped   multi invalid 
      0       0       0       0 

 

When I checked the bam file with samtools flagstat, I get:

samtools flagstat chr1.bam 
20248729 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
20248729 + 0 mapped (100.00% : N/A)
20248729 + 0 paired in sequencing
10125154 + 0 read1
10123575 + 0 read2
20248729 + 0 properly paired (100.00% : N/A)
20248729 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
1240765 + 0 with mate mapped to a different chr
1240765 + 0 with mate mapped to a different chr (mapQ>=5)

So the reads are paired but preparePairs() does not seem to recognise this. Any suggestions?

Thank you very much!

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)
locale:
[1] C/UTF-8/C/C/C/C
attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     
other attached packages:
[1] diffHic_1.2.2        GenomicRanges_1.22.4 GenomeInfoDb_1.6.3  
[4] IRanges_2.4.7        S4Vectors_0.8.11     BiocGenerics_0.16.1 
loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.32.3       XVector_0.10.0            
 [3] edgeR_3.12.0               zlibbioc_1.16.0           
 [5] GenomicAlignments_1.6.3    BiocParallel_1.4.3        
 [7] lattice_0.20-33            BSgenome_1.38.0           
 [9] tools_3.2.3                grid_3.2.3                
[11] SummarizedExperiment_1.0.2 rhdf5_2.14.0              
[13] Biobase_2.30.0             csaw_1.4.1                
[15] DBI_0.3.1                  lambda.r_1.1.7            
[17] futile.logger_1.4.1        rtracklayer_1.30.2        
[19] futile.options_1.0.0       bitops_1.0-6              
[21] biomaRt_2.26.1             RCurl_1.95-4.7            
[23] RSQLite_1.0.0              limma_3.26.8              
[25] GenomicFeatures_1.22.13    Biostrings_2.38.4         
[27] Rsamtools_1.22.0           locfit_1.5-9.1            
[29] XML_3.98-1.3 

 
diffHic preparePairs paired BAM • 1.9k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 34 minutes ago
The city by the bay

Check that the BAM file is name sorted. Also, preparePairs identifies read pairs based on having identical alignment names. If that's not the case (e.g., there's /1 or /2 appended to the name) the function will fail.

ADD COMMENT
0
Entering edit mode

Thank you very much. 

It now recognises pairs even though there is still a large number of singles.

ADD REPLY
0
Entering edit mode

Probably because if you've only taken chromosome 1 reads. If the mate isn't present in the same file (e.g., because it lies on another chromosome), then preparePairs will mark the read as a singleton. For all practical purposes, this classification is probably correct, because you can't easily interpret those reads without a mate in a Hi-C experiment.

ADD REPLY
0
Entering edit mode

Thank you very much. This was very informative.

ADD REPLY
0
Entering edit mode

Thank you very much. This was very informative.

ADD REPLY

Login before adding your answer.

Traffic: 545 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6