You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm relatively new to bioinformatics and microbiology, and I've been following the DADA2 tutorial to process my 16S gDNA & eDNA sequencing data. After merging my paired reads and constructing the sequence table, I visualized the sequence lengths and noticed a considerable amount of variability. I'm unsure whether this variability is due to biological reasons or if it might be caused by technical issues, such as incomplete merging or sequencing errors.
Up until this point, everything else in the pipeline has looked good. I'm curious if this variability in sequence lengths is a common observation at this stage when working with the 16S marker. If anyone could offer some advise i would greatly appreciate it :)
Here's some additional information about my data:
illumina MiSeq, 2x300 paired-end sequencing
V3-V4 target region
Primer set: FWD: CCTACGGGNGGCWGCAG, REV: GACTACHVGGGTATCTAATCC
primers have been successfully removed
thanks in advance!
The text was updated successfully, but these errors were encountered:
Up until this point, everything else in the pipeline has looked good. I'm curious if this variability in sequence lengths is a common observation at this stage when working with the 16S marker.
Yes. First off, the two modes (peaks) of your sequence length distribution are expected. There is a natural bimodal length distribution of the V3-V4 16S rRNA gene region that differ by about 20 nts.
The various other lengths you observe is not uncommon, and typically comes from a mix of off-target amplification and library artefacts. It is completely valid to "cut a band in silico" and remove the ASVs outside the expected length distribution (this is described in the DADA2 tutorial, "Construct sequence table" section: https://benjjneb.github.io/dada2/tutorial.html
Hi,
I'm relatively new to bioinformatics and microbiology, and I've been following the DADA2 tutorial to process my 16S gDNA & eDNA sequencing data. After merging my paired reads and constructing the sequence table, I visualized the sequence lengths and noticed a considerable amount of variability. I'm unsure whether this variability is due to biological reasons or if it might be caused by technical issues, such as incomplete merging or sequencing errors.
Up until this point, everything else in the pipeline has looked good. I'm curious if this variability in sequence lengths is a common observation at this stage when working with the 16S marker. If anyone could offer some advise i would greatly appreciate it :)
Here's some additional information about my data:
illumina MiSeq, 2x300 paired-end sequencing
V3-V4 target region
Primer set: FWD: CCTACGGGNGGCWGCAG, REV: GACTACHVGGGTATCTAATCC
primers have been successfully removed
thanks in advance!
The text was updated successfully, but these errors were encountered: