Skip to Content
Merck

Fast and sensitive mapping of bisulfite-treated sequencing data.

Bioinformatics (Oxford, England) (2012-05-15)
Christian Otto, Peter F Stadler, Steve Hoffmann
ABSTRACT

Cytosine DNA methylation is one of the major epigenetic modifications and influences gene expression, developmental processes, X-chromosome inactivation, and genomic imprinting. Aberrant methylation is furthermore known to be associated with several diseases including cancer. The gold standard to determine DNA methylation on genome-wide scales is 'bisulfite sequencing': DNA fragments are treated with sodium bisulfite resulting in the conversion of unmethylated cytosines into uracils, whereas methylated cytosines remain unchanged. The resulting sequencing reads thus exhibit asymmetric bisulfite-related mismatches and suffer from an effective reduction of the alphabet size in the unmethylated regions, rendering the mapping of bisulfite sequencing reads computationally much more demanding. As a consequence, currently available read mapping software often fails to achieve high sensitivity and in many cases requires unrealistic computational resources to cope with large real-life datasets. In this study, we present a seed-based approach based on enhanced suffix arrays in conjunction with Myers bit-vector algorithm to efficiently extend seeds to optimal semi-global alignments while allowing for bisulfite-related substitutions. It outperforms most current approaches in terms of sensitivity and performs time-competitive in mapping hundreds of millions of sequencing reads to vertebrate genomes. The software segemehl is freely available at http://www.bioinf.uni-leipzig.de/Software/segemehl.

MATERIALS
Product Number
Brand
Product Description

Sigma-Aldrich
Sodium bisulfite solution, purum, ~40%
Sigma-Aldrich
Sodium bisulfite, ACS reagent