journal
Journals Algorithms for Molecular Biolo...

Algorithms for Molecular Biology : AMB

https://read.qxmd.com/read/38493130/infrared-a-declarative-tree-decomposition-powered-framework-for-bioinformatics
#1
JOURNAL ARTICLE
Hua-Ting Yao, Bertrand Marchand, Sarah J Berkemer, Yann Ponty, Sebastian Will
MOTIVATION: Many bioinformatics problems can be approached as optimization or controlled sampling tasks, and solved exactly and efficiently using Dynamic Programming (DP). However, such exact methods are typically tailored towards specific settings, complex to develop, and hard to implement and adapt to problem variations. METHODS: We introduce the Infrared framework to overcome such hindrances for a large class of problems. Its underlying paradigm is tailored toward problems that can be declaratively formalized as sparse feature networks, a generalization of constraint networks...
March 16, 2024: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/38481327/median-quartet-tree-search-algorithms-using-optimal-subtree-prune-and-regraft
#2
JOURNAL ARTICLE
Shayesteh Arasti, Siavash Mirarab
Gene trees can be different from the species tree due to biological processes and inference errors. One way to obtain a species tree is to find one that maximizes some measure of similarity to a set of gene trees. The number of shared quartets between a potential species tree and gene trees provides a statistically justifiable score; if maximized properly, it could result in a statistically consistent estimator of the species tree under several statistical models of discordance. However, finding the median quartet score tree, one that maximizes this score, is NP-Hard, motivating several existing heuristic algorithms...
March 13, 2024: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/38475889/suffix-sorting-via-matching-statistics
#3
JOURNAL ARTICLE
Zsuzsanna Lipták, Francesco Masillo, Simon J Puglisi
We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the generalized suffix array. Our experimental evidence with a prototype implementation (a tool we call sacamats) shows that on string collections with highly similar strings we can construct the suffix array in time competitive with or faster than the fastest available methods...
March 12, 2024: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/38468275/finding-maximal-exact-matches-in-graphs
#4
JOURNAL ARTICLE
Nicola Rizzo, Manuel Cáceres, Veli Mäkinen
BACKGROUND: We study the problem of finding maximal exact matches (MEMs) between a query string Q and a labeled graph G. MEMs are an important class of seeds, often used in seed-chain-extend type of practical alignment methods because of their strong connections to classical metrics. A principled way to speed up chaining is to limit the number of MEMs by considering only MEMs of length at least <mml:math xmlns:mml="https://www.w3.org/1998/Math/MathML"><mml:mi>κ</mml:mi></mml:math> ( <mml:math xmlns:mml="https://www...
March 11, 2024: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/38433200/sparsernafold-optimized-sparse-rna-pseudoknot-free-folding-with-dangle-consideration
#5
JOURNAL ARTICLE
Mateo Gray, Sebastian Will, Hosna Jabbari
MOTIVATION: Computational RNA secondary structure prediction by free energy minimization is indispensable for analyzing structural RNAs and their interactions. These methods find the structure with the minimum free energy (MFE) among exponentially many possible structures and have a restrictive time and space complexity ( <mml:math xmlns:mml="https://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:msup><mml:mi>n</mml:mi> <mml:mn>3</mml:mn></mml:msup> <mml:mo>)</mml:mo></mml:mrow> </mml:math> time and <mml:math xmlns:mml="https://www...
March 3, 2024: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/38414060/recombinations-chains-and-caps-resolving-problems-with-the-dcj-indel-model
#6
JOURNAL ARTICLE
Leonard Bohnenkämper
One of the most fundamental problems in genome rearrangement studies is the (genomic) distance problem. It is typically formulated as finding the minimum number of rearrangements under a model that are needed to transform one genome into the other. A powerful multi-chromosomal model is the Double Cut and Join (DCJ) model.While the DCJ model is not able to deal with some situations that occur in practice, like duplicated or lost regions, it was extended over time to handle these cases. First, it was extended to the DCJ-indel model, solving the issue of lost markers...
February 27, 2024: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/38355611/unifying-duplication-episode-clustering-and-gene-species-mapping-inference
#7
JOURNAL ARTICLE
Paweł Górecki, Natalia Rutecka, Agnieszka Mykowiecka, Jarosław Paszek
We present a novel problem, called MetaEC, which aims to infer gene-species assignments in a collection of partially leaf-labeled gene trees labels by minimizing the size of duplication episode clustering (EC). This problem is particularly relevant in metagenomics, where incomplete data often poses a challenge in the accurate reconstruction of gene histories. To solve MetaEC, we propose a polynomial time dynamic programming (DP) formulation that verifies the existence of a set of duplication episodes from a predefined set of episode candidates...
February 14, 2024: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/38321522/global-exact-optimisations-for-chloroplast-structural-haplotype-scaffolding
#8
JOURNAL ARTICLE
Victor Epain, Rumen Andonov
BACKGROUND: Scaffolding is an intermediate stage of fragment assembly. It consists in orienting and ordering the contigs obtained by the assembly of the sequencing reads. In the general case, the problem has been largely studied with the use of distances data between the contigs. Here we focus on a dedicated scaffolding for the chloroplast genomes. As these genomes are small, circular and with few specific repeats, numerous approaches have been proposed to assemble them. However, their specificities have not been sufficiently exploited...
February 6, 2024: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/38321476/predicting-horizontal-gene-transfers-with-perfect-transfer-networks
#9
JOURNAL ARTICLE
Alitzel López Sánchez, Manuel Lafond
BACKGROUND: Horizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from a particular genomic signature, while phylogenetic methods use sequences to reconstruct the gene and species trees. However, it is well-known that sequences have difficulty identifying ancient transfers since mutations have enough time to erase all evidence of such events. In this work, we ask whether character-based methods can predict gene transfers...
February 6, 2024: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/38279113/co-linear-chaining-on-pangenome-graphs
#10
JOURNAL ARTICLE
Jyotshna Rajput, Ghanshyam Chandra, Chirag Jain
Pangenome reference graphs are useful in genomics because they compactly represent the genetic diversity within a species, a capability that linear references lack. However, efficiently aligning sequences to these graphs with complex topology and cycles can be challenging. The seed-chain-extend based alignment algorithms use co-linear chaining as a standard technique to identify a good cluster of exact seed matches that can be combined to form an alignment. Recent works show how the co-linear chaining problem can be efficiently solved for acyclic pangenome graphs by exploiting their small width and how incorporating gap cost in the scoring function improves alignment accuracy...
January 27, 2024: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/38254124/fulgor-a-fast-and-compact-k-mer-index-for-large-scale-matching-and-color-queries
#11
JOURNAL ARTICLE
Jason Fan, Jamshed Khan, Noor Pratap Singh, Giulio Ermanno Pibiri, Rob Patro
The problem of sequence identification or matching-determining the subset of reference sequences from a given collection that are likely to contain a short, queried nucleotide sequence-is relevant for many important tasks in Computational Biology, such as metagenomics and pangenome analysis. Due to the complex nature of such analyses and the large scale of the reference collections a resource-efficient solution to this problem is of utmost importance. This poses the threefold challenge of representing the reference collection with a data structure that is efficient to query, has light memory usage, and scales well to large collections...
January 22, 2024: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/38191515/dollo-cdp-a-polynomial-time-algorithm-for-the-clade-constrained-large-dollo-parsimony-problem
#12
JOURNAL ARTICLE
Junyan Dai, Tobias Rubel, Yunheng Han, Erin K Molloy
The last decade of phylogenetics has seen the development of many methods that leverage constraints plus dynamic programming. The goal of this algorithmic technique is to produce a phylogeny that is optimal with respect to some objective function and that lies within a constrained version of tree space. The popular species tree estimation method ASTRAL, for example, returns a tree that (1) maximizes the quartet score computed with respect to the input gene trees and that (2) draws its branches (bipartitions) from the input constraint set...
January 8, 2024: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/38178195/investigating-the-complexity-of-the-double-distance-problems
#13
JOURNAL ARTICLE
Marília D V Braga, Leonie R Brockmann, Katharina Klerx, Jens Stoye
BACKGROUND: Two genomes [Formula: see text] and [Formula: see text] over the same set of gene families form a canonical pair when each of them has exactly one gene from each family. Denote by [Formula: see text] the number of common families of [Formula: see text] and [Formula: see text]. Different distances of canonical genomes can be derived from a structure called breakpoint graph, which represents the relation between the two given genomes as a collection of cycles of even length and paths...
January 4, 2024: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/38062452/emma-a-new-method-for-computing-multiple-sequence-alignments-given-a-constraint-subset-alignment
#14
JOURNAL ARTICLE
Chengze Shen, Baqiao Liu, Kelly P Williams, Tandy Warnow
BACKGROUND: Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity. RESULTS: We present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i...
December 7, 2023: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/38057863/correction-constructing-founder-sets-under-allelic-and-non-allelic-homologous-recombination
#15
Konstantinn Bonnet, Tobias Marschall, Daniel Doerr
No abstract text is available yet for this article.
December 6, 2023: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/38041153/automated-design-of-dynamic-programming-schemes-for-rna-folding-with-pseudoknots
#16
JOURNAL ARTICLE
Bertrand Marchand, Sebastian Will, Sarah J Berkemer, Yann Ponty, Laurent Bulteau
Although RNA secondary structure prediction is a textbook application of dynamic programming (DP) and routine task in RNA structure analysis, it remains challenging whenever pseudoknots come into play. Since the prediction of pseudoknotted structures by minimizing (realistically modelled) energy is NP-hard, specialized algorithms have been proposed for restricted conformation classes that capture the most frequently observed configurations. To achieve good performance, these methods rely on specific and carefully hand-crafted DP schemes...
December 1, 2023: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/38041123/quartets-enable-statistically-consistent-estimation-of-cell-lineage-trees-under-an-unbiased-error-and-missingness-model
#17
JOURNAL ARTICLE
Yunheng Han, Erin K Molloy
Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequences, traditional approaches assume the input data are error-free and the output tree is fully resolved. These assumptions are challenged in tumor phylogenetics because single-cell sequencing produces sparse, error-ridden data and because tumors evolve clonally. Here, we study the theoretical utility of methods based on quartets (four-leaf, unrooted phylogenetic trees) in light of these barriers...
December 1, 2023: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/38037088/new-algorithms-for-structure-informed-genome-rearrangement
#18
JOURNAL ARTICLE
Eden Ozeri, Meirav Zehavi, Michal Ziv-Ukelson
We define two new computational problems in the domain of perfect genome rearrangements, and propose three algorithms to solve them. The rearrangement scenarios modeled by the problems consider Reversal and Block Interchange operations, and a PQ-tree is utilized to guide the allowed operations and to compute their weights. In the first problem, [Formula: see text] ([Formula: see text]), we define the basic structure-informed rearrangement measure. Here, we assume that the gene order members of the gene cluster from which the PQ-tree is constructed are permutations...
December 1, 2023: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/37940998/relative-timing-information-and-orthology-in-evolutionary-scenarios
#19
JOURNAL ARTICLE
David Schaller, Tom Hartmann, Manuel Lafond, Peter F Stadler, Nicolas Wieseke, Marc Hellmuth
BACKGROUND: Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the mapping of the vertices of a gene tree T to vertices and edges of a species tree S. The relative timing of the last common ancestors of two extant genes (leaves of T) and the last common ancestors of the two species (leaves of S) in which they reside is indicative of horizontal gene transfers (HGT) and ancient duplications. Orthologous gene pairs, on the other hand, require that their last common ancestors coincides with a corresponding speciation event...
November 8, 2023: Algorithms for Molecular Biology: AMB
https://read.qxmd.com/read/37775806/constructing-founder-sets-under-allelic-and-non-allelic-homologous-recombination
#20
JOURNAL ARTICLE
Konstantinn Bonnet, Tobias Marschall, Daniel Doerr
Homologous recombination between the maternal and paternal copies of a chromosome is a key mechanism for human inheritance and shapes population genetic properties of our species. However, a similar mechanism can also act between different copies of the same sequence, then called non-allelic homologous recombination (NAHR). This process can result in genomic rearrangements-including deletion, duplication, and inversion-and is underlying many genomic disorders. Despite its importance for genome evolution and disease, there is a lack of computational models to study genomic loci prone to NAHR...
September 29, 2023: Algorithms for Molecular Biology: AMB
journal
journal
41260
1
2
Fetch more papers »
Fetching more papers... Fetching...
Remove bar
Read by QxMD icon Read
×

Save your favorite articles in one place with a free QxMD account.

×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"

We want to hear from doctors like you!

Take a second to answer a survey question.