Skip to Content
Merck
HomeNext-Generation SequencingChIC and CUT&RUN Data Analysis Tool Frequently Asked Questions (FAQs)

ChIC and CUT&RUN Data Analysis Tool Frequently Asked Questions (FAQs)

What sequence read file formats are allowed?
The tool works with sequencing reads from the Illumina Platform in FASTQ or fastq.gz format. Multiple FASTQ files can be zipped as either tar.gz or .zip and can be uploaded into the tool.

What sequence type options are suitable for the data analysis tool?
Single-end or Paired-end sequences are suitable.

What species of reference genomes are available?
Human, mouse and rat genome references are available.

What are the options of method for peak calling?
Narrow peak or broad peak options are available with the data analysis tool.

What is the method for sequence data QC?
Sequence data (raw read) QC is done by using the FastQC quality check tool. This tool has been widely used in various publications and has greater acceptance in the scientific community for FASTQ file quality check. The quality check is based on the sequence quality of the individual bases in the reads and the reports are generated accordingly. For more details, please refer to https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

What algorithms are used for peak calling?

  • Multiple algorithms are implemented in is the data analysis tool based on the absence or presence of replicates. For single samples, we use the MACS2 tool for peak calling. Model-based Analysis of ChIP-Seq (MACS) is a computational algorithm that identifies genome-wide locations of transcription/chromatin factor binding or histone modifications from ChIP-seq data. MACS consists of four steps: removing redundant reads, adjusting read position, calculating peak enrichment, and estimating the empirical false discovery rate. For more details, please refer to https://www.nature.com/articles/nprot.2012.101
  • For samples with two replicates, the data analysis tool uses MACS2 peak calling for individual replicate and Irreproducible Discovery Rate (IDR) for consensus peak calling from all the replicates. IDR measures consistency between replicates in high-throughput experiments. Additionally, it also uses reproducibility in score rankings between peaks in each replicate to determine an optimal cutoff for significance.
  • Mixture Model Clustering (JAMM), which is used to obtain the consensus peak calling from the replicates. For more details, please refer to https://academic.oup.com/bioinformatics/article/31/1/48/2365061

When do I choose narrow peak or the broad peak method?
It is recommended to use narrow peak for the identification of transcription factor binding locations and broad peak for the investigation of histone modification.

What are the options for barcodes?
Both un-barcoded and barcoded samples can be analyzed using the data analysis tool. Select either “Un-Barcoded samples” or “Barcoded samples” in the “Peak call & annotation” page.

What are the options for replicates?
Both single sample or multiple replicates workflows are designed and configured in the tool. Customers can choose either workflow to complete the data analysis.

What kinds of results will be generated in the final report?
Depending on your samples (e.g. single or multiple replicates) and choice of analyses, several reports can be generated and are available for download, including:

  1. Peaks and alignment visualization using IGV
    Integrative Genomics Viewer (IGV) enables users to understand the peak location and read alignments with respect to the reference genome. For analysis of samples without replicates, the peaks and alignments are shown as separate tracks. For analysis with replicates, the peak from consensus peak calling and alignment of individual replicates are shown as separate tracks. The last track contains the gene annotations for the genomic location in the current view.
  2. Peak report
    Peak report contains the chromosome number, the start and end locations of the identified peaks and the corresponding statistical measures associated with each peak. If the analysis was done using multiple replicates, the peak report shows the consensus peak report obtained by considering all the replicates. 
  3. Peak distribution relative to gene
    This report shows distribution of aggregated peaks around the transcription start site and for the complete gene structure.
  4. GO analysis
    This report shows the Gene Ontology (GO) terms that are enriched for the peaks identified. Each peak identified from peak calling is annotated with the respective genes and the gene list used for GO enrichment analysis. The results displayed also contains columns with statistics associated with each of the identified GO term.
  5. Pathway analysis
    This tab shows the pathway analysis that is enriched for the identified peaks. Each peak identified from peak calling is annotated with the respective genes and this gene list is used for pathway enrichment analysis. The results displayed also contain columns with statistics associated with each of the identified pathways. 
  6. Venn Diagram
    For projects with replicates, the Venn diagram plots are generated to visualize the number of peaks that overlap between the replicates, as well as with the consensus peaks. For each of the replicates, the peak regions are annotated with genes and this gene list is used for plotting the Venn diagram.

How do I access the data analysis tool?
For customers who purchased our ChIC/CUT&RUN kits, they can access the data analysis tool for free by following the steps in the instruction sheet found in the ChIC/CUT&RUN kit’s package. An overview of the data analysis tool can be viewed here.

To obtain access to the ChIC/CUT&RUN data analysis tool, please click here. Your order ID can be found on the packing slip, the order confirmation or retrieved by calling your local Customer Service.

Sign In To Continue

To continue reading please sign in or create an account.

Don't Have An Account?