Read by QxMD icon Read

Algorithms for Molecular Biology: AMB

Morten Muhlig Nielsen, Paula Tataru, Tobias Madsen, Asger Hobolth, Jakob Skou Pedersen
Background: Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or protein binding affinity. Current motif discovery tools suffer from limitations in searching large motif spaces, and thus more complex motifs may not be included. There is thus a need for motif analysis methods that are tailored for analyzing specific complex motifs motivated by biological questions and hypotheses rather than acting as a screen based motif finding tool...
2018: Algorithms for Molecular Biology: AMB
Fabian Gärtner, Lydia Müller, Peter F Stadler
Background: Superbubbles are distinctive subgraphs in direct graphs that play an important role in assembly algorithms for high-throughput sequencing (HTS) data. Their practical importance derives from the fact they are connected to their host graph by a single entrance and a single exit vertex, thus allowing them to be handled independently. Efficient algorithms for the enumeration of superbubbles are therefore of important for the processing of HTS data. Superbubbles can be identified within the strongly connected components of the input digraph after transforming them into directed acyclic graphs...
2018: Algorithms for Molecular Biology: AMB
Fabian Gärtner, Christian Höner Zu Siederdissen, Lydia Müller, Peter F Stadler
Background: Genome sequences and genome annotation data have become available at ever increasing rates in response to the rapid progress in sequencing technologies. As a consequence the demand for methods supporting comparative, evolutionary analysis is also growing. In particular, efficient tools to visualize-omics data simultaneously for multiple species are sorely lacking. A first and crucial step in this direction is the construction of a common coordinate system. Since genomes not only differ by rearrangements but also by large insertions, deletions, and duplications, the use of a single reference genome is insufficient, in particular when the number of species becomes large...
2018: Algorithms for Molecular Biology: AMB
Yves Frank, Tomas Hruz, Thomas Tschager, Valentin Venzin
Background: Liquid chromatography combined with tandem mass spectrometry is an important tool in proteomics for peptide identification. Liquid chromatography temporally separates the peptides in a sample. The peptides that elute one after another are analyzed via tandem mass spectrometry by measuring the mass-to-charge ratio of a peptide and its fragments. De novo peptide sequencing is the problem of reconstructing the amino acid sequences of a peptide from this measurement data. Past de novo sequencing algorithms solely consider the mass spectrum of the fragments for reconstructing a sequence...
2018: Algorithms for Molecular Biology: AMB
Andre R Oliveira, Guillaume Fertin, Ulisses Dias, Zanoni Dias
Background: One way to estimate the evolutionary distance between two given genomes is to determine the minimum number of large-scale mutations, or genome rearrangements , that are necessary to transform one into the other. In this context, genomes can be represented as ordered sequences of genes, each gene being represented by a signed integer. If no gene is repeated, genomes are thus modeled as signed permutations of the form <mml:math xmlns:mml=""> <mml:mrow> <mml:mi>π</mml:mi> <mml:mo>=</mml:mo> <mml:mo>(</mml:mo> <mml:msub> <mml:mi>π</mml:mi> <mml:mn>1</mml:mn> </mml:msub> <mml:msub> <mml:mi>π</mml:mi> <mml:mn>2</mml:mn> </mml:msub> <mml:mo>…</mml:mo> <mml:msub> <mml:mi>π</mml:mi> <mml:mi>n</mml:mi> </mml:msub> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> , and in that case we can consider without loss of generality that one of them is the identity permutation <mml:math xmlns:mml="http://www...
2018: Algorithms for Molecular Biology: AMB
Alexander Donath, Peter F Stadler
Background: Most phylogenetic studies using molecular data treat gaps in multiple sequence alignments as missing data or even completely exclude alignment columns that contain gaps. Results: Here we show that gap patterns in large-scale, genome-wide alignments are themselves phylogenetically informative and can be used to infer reliable phylogenies provided the gap data are properly filtered to reduce noise introduced by the alignment method. We introduce here the notion of split-inducing indels ( splids ) that define an approximate bipartition of the taxon set...
2018: Algorithms for Molecular Biology: AMB
Michał Aleksander Ciach, Anna Muszewska, Paweł Górecki
Background: Horizontal gene transfer (HGT), a process of acquisition and fixation of foreign genetic material, is an important biological phenomenon. Several approaches to HGT inference have been proposed. However, most of them either rely on approximate, non-phylogenetic methods or on the tree reconciliation, which is computationally intensive and sensitive to parameter values. Results: We investigate the locus tree inference problem as a possible alternative that combines the advantages of both approaches...
2018: Algorithms for Molecular Biology: AMB
Jingli Wu, Qian Zhang
Background: Haplotype assembly, reconstructing haplotypes from sequence data, is one of the major computational problems in bioinformatics. Most of the current methodologies for haplotype assembly are designed for diploid individuals. In recent years, genomes having more than two sets of homologous chromosomes have attracted many research groups that are interested in the genomics of disease, phylogenetics, botany and evolution. However, there is still a lack of methods for reconstructing polyploid haplotypes...
2018: Algorithms for Molecular Biology: AMB
Pijus Simonaitis, Krister M Swenson
Background: The double cut and join (DCJ) model of genome rearrangement is well studied due to its mathematical simplicity and power to account for the many events that transform gene order. These studies have mostly been devoted to the understanding of minimum length scenarios transforming one genome into another. In this paper we search instead for rearrangement scenarios that minimize the number of rearrangements whose breakpoints are unlikely due to some biological criteria. One such criterion has recently become accessible due to the advent of the Hi-C experiment, facilitating the study of 3D spacial distance between breakpoint regions...
2018: Algorithms for Molecular Biology: AMB
Samuele Girotto, Matteo Comin, Cinzia Pizzi
Background: Patterns with wildcards in specified positions, namely spaced seeds , are increasingly used instead of k -mers in many bioinformatics applications that require indexing, querying and rapid similarity search, as they can provide better sensitivity. Many of these applications require to compute the hashing of each position in the input sequences with respect to the given spaced seed, or to multiple spaced seeds. While the hashing of k -mers can be rapidly computed by exploiting the large overlap between consecutive k -mers, spaced seeds hashing is usually computed from scratch for each position in the input sequence, thus resulting in slower processing...
2018: Algorithms for Molecular Biology: AMB
Nidhi Shah, Stephen F Altschul, Mihai Pop
Background: An important task in a metagenomic analysis is the assignment of taxonomic labels to sequences in a sample. Most widely used methods for taxonomy assignment compare a sequence in the sample to a database of known sequences. Many approaches use the best BLAST hit(s) to assign the taxonomic label. However, it is known that the best BLAST hit may not always correspond to the best taxonomic match. An alternative approach involves phylogenetic methods, which take into account alignments and a model of evolution in order to more accurately define the taxonomic origin of sequences...
2018: Algorithms for Molecular Biology: AMB
Sarah Christensen, Erin K Molloy, Pranjal Vachaspati, Tandy Warnow
Background: For a combination of reasons (including data generation protocols, approaches to taxon and gene sampling, and gene birth and loss), estimated gene trees are often incomplete, meaning that they do not contain all of the species of interest. As incomplete gene trees can impact downstream analyses, accurate completion of gene trees is desirable. Results: We introduce the Optimal Tree Completion problem , a general optimization problem that involves completing an unrooted binary tree (i...
2018: Algorithms for Molecular Biology: AMB
Kazunori D Yamada
Background: A profile-comparison method with position-specific scoring matrix (PSSM) is among the most accurate alignment methods. Currently, cosine similarity and correlation coefficients are used as scoring functions of dynamic programming to calculate similarity between PSSMs. However, it is unclear whether these functions are optimal for profile alignment methods. By definition, these functions cannot capture nonlinear relationships between profiles. Therefore, we attempted to discover a novel scoring function, which was more suitable for the profile-comparison method than existing functions, using neural networks...
2018: Algorithms for Molecular Biology: AMB
João A Carriço, Maxime Crochemore, Alexandre P Francisco, Solon P Pissis, Bruno Ribeiro-Gonçalves, Cátia Vaz
Background: Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence-based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profile data, fast analysis times and their capacity to create common nomenclatures for strains or clones. This led to development of several novel methods and several databases being made available for many microbial species. With the mainstream use of High Throughput Sequencing, the amount of data being accumulated in these databases is huge, storing thousands of different profiles...
2018: Algorithms for Molecular Biology: AMB
Nidia Obscura Acosta, Veli Mäkinen, Alexandru I Tomescu
Background: Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally formulated as finding a collection of circular walks of a directed graph G that together cover all nodes, or edges, of G . Approach: We address this problem with the "safe and complete" framework of Tomescu and Medvedev (Research in computational Molecular biology-20th annual conference, RECOMB 9649:152-163, 2016)...
2018: Algorithms for Molecular Biology: AMB
Nikolai Nøjgaard, Manuela Geiß, Daniel Merkle, Peter F Stadler, Nicolas Wieseke, Marc Hellmuth
Background: In the absence of horizontal gene transfer it is possible to reconstruct the history of gene families from empirically determined orthology relations, which are equivalent to event-labeled gene trees. Knowledge of the event labels considerably simplifies the problem of reconciling a gene tree T with a species trees S , relative to the reconciliation problem without prior knowledge of the event types. It is well-known that optimal reconciliations in the unlabeled case may violate time-consistency and thus are not biologically feasible...
2018: Algorithms for Molecular Biology: AMB
Md Shamsuzzoha Bayzid, Tandy Warnow
Motivation: Species tree estimation from gene trees can be complicated by gene duplication and loss, and "gene tree parsimony" (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i...
2018: Algorithms for Molecular Biology: AMB
Burkhard Morgenstern, Svenja Schöbel, Chris-André Leimeister
Background: Various approaches to alignment-free sequence comparison are based on the length of exact or inexact word matches between pairs of input sequences. Haubold et al. (J Comput Biol 16:1487-1500, 2009) showed how the average number of substitutions per position between two DNA sequences can be estimated based on the average length of exact common substrings. Results: In this paper, we study the length distribution of k -mismatch common substrings between two sequences...
2017: Algorithms for Molecular Biology: AMB
Felipe A Louza, Guilherme P Telles, Steve Hoffmann, Cristina D A Ciferri
Background: Suffix arrays, augmented by additional data structures, allow solving efficiently many string processing problems. The external memory construction of the generalized suffix array for a string collection is a fundamental task when the size of the input collection or the data structure exceeds the available internal memory. Results: In this article we present and analyze [Formula: see text] [introduced in CPM (External memory generalized suffix and [Formula: see text] arrays construction...
2017: Algorithms for Molecular Biology: AMB
Shixiang Wan, Quan Zou
BACKGROUND: Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. METHODS: Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses...
2017: Algorithms for Molecular Biology: AMB
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"