Skip to content

isaclee/nanopore-methylation-utilities

 
 

Repository files navigation

nanopore-methylation-utilities

Set of utilities for analyzing nanopore methylation data

bed-style format methylation file

I convert the nanopolish methylation calling output into bed-style format, such that each line is

Contig Start End Read name Methylation call string Log-likelihood ratios Motif context

where Methylation call string is arranged such that

  • numbers are separated by methylation calls
  • each number is cumulative distance from the "start"
  • methylation call corresponds to the motif at position preceding the letter
  • "m" means methylated, "u" means unmethylated, and "x" means uncalled (not confident)

The resulting bed-style file is sorted, bgzipped, and tabix indexed for easy manipulation.

./mtsv2bedGraph.py -i [path/to/nanopolish/methylation.tsv] |\
  sort -k1,1 -k2,2n | bgzip > [methylation.bed.gz]
tabix -p bed [methylation.bed.gz]

converting bam for igv

Using the converted bed-style methylation file, the original bam file can be "bisulfite converted in silico" for easy visualization on IGV via their bisulfite mode. There are three options for specifying the region to convert:

  • -r,--regions : for multiple regions, supply the bed file
  • -w,--window : for one region, supply the coordinate (chr:start-end)
  • without either of the above options, all reads will be converted
./convert_bam_for_methylation.py -b [path/to/sorted.bam] \
  -c [path/to/cpg.methylation.bed.gz] -f [path/to/reference.fasta ] |\
  samtools sort -o [path/to/converted.bam]
samtools index [path/to/cnverted.bam]

For minimap2 alignments

Using --MD option during alignment is recommended.

The default output does not have MD tags, and MD tags are necessary for using pysam to get the reference sequence. To get around this, the fasta of reference genome must be supplied via -f,--fasta.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 72.3%
  • R 24.8%
  • Shell 2.9%