Sequence comparative analysis using networks: software for evaluating de novo transcript assembly from next-generation sequencing

Ian Misner, Cédric Bicep, Philippe Lopez, Sébastien Halary, Eric Bapteste, Christopher E Lane
Molecular Biology and Evolution 2013, 30 (8): 1975-86
DNA sequencing technology is becoming more accessible to a variety of researchers as costs continue to decline. As researchers begin to sequence novel transcriptomes, most of these data sets lack a reference genome and will have to rely on de novo assemblers. Making comparisons across assemblies can be difficult: each program has its strengths and weaknesses, and no tool exists to comparatively evaluate these data sets. We developed software in R, called Sequence Comparative Analysis using Networks (SCAN), to perform statistical comparisons between distinct assemblies. SCAN uses a reference data set to identify the most accurate de novo assembly and the "good" transcripts in the user's data. We tested SCAN on three publicly available transcriptomes, each assembled using three assembly programs. Moreover, we sequenced the transcriptome of the oomycete Achlya hypogyna and compared de novo assemblies from Velvet, ABySS, and the CLC Genomics Workbench assembly algorithms. One thousand one hundred twenty-eight of the CLC transcripts were statistically similar to the reference, compared with 49 of the Velvet transcripts and 937 of the ABySS transcripts. SCAN's strength is providing statistical support for transcript assemblies in a biological context. However, SCAN is designed to compare distinct node sets in networks, therefore it can also easily be extended to perform statistical comparisons on any network graph regardless of what the nodes represent.

Full Text Links

Find Full Text Links for this Article


You are not logged in. Sign Up or Log In to join the discussion.

Related Papers

Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"