Xin Li, Samaneh Saadat, Haiyan Hu, Xiaoman Li
MOTIVATION: The bacterial haplotype reconstruction is critical for selecting proper treatments for diseases caused by unknown haplotypes. Existing methods and tools do not work well on this task, because they are usually developed for viral instead of bacterial populations. RESULTS: In this study, we developed BHap, a novel algorithm based on fuzzy flow networks, for reconstructing bacterial haplotypes from next generation sequencing data. Tested on simulated and experimental datasets, we showed that BHap was capable of reconstructing haplotypes of bacterial populations with an average F1 score of 0...
April 20, 2019: Bioinformatics
Bence Ágg, Andrea Császár, Máté Szalay-Bekő, Dániel V Veres, Réka Mizsei, Péter Ferdinandy, Péter Csermely, István A Kovács
MOTIVATION: Network visualizations of complex biological datasets usually result in 'hairball' images, which do not discriminate network modules. RESULTS: We present the EntOptLayout Cytoscape plug-in based on a recently developed network representation theory. The plug-in provides an efficient visualization of network modules, which represent major protein complexes in protein-protein interaction and signalling networks. Importantly, the tool gives a quality score of the network visualization by calculating the information loss between the input data and the visual representation showing a 3- to 25-fold improvement over conventional methods...
April 20, 2019: Bioinformatics
Youssef Darzi, Yuta Yamate, Takuji Yamada
SUMMARY: Functional annotations and their hierarchical classification are widely used in omics workflows to build novel insight upon existing biological knowledge. Currently, a plethora of tools is available to explore omics datasets at the level of functional annotations, but there is a lack of feature rich and user-friendly tools that help scientists take advantage of their hierarchical classification for additional and often invaluable insights. Here we present FuncTree2, a user-friendly web application that turns hierarchical classifications into interactive and highly customizable radial trees, and enables researchers to visualize their data simultaneously on all its levels...
April 20, 2019: Bioinformatics
Shaun D Jackman, Tatyana Mozgacheva, Susie Chen, Brendan O'Huiginn, Lance Bailey, Inanc Birol, Steven J M Jones
SUMMARY: The ORCA bioinformatics environment is a Docker image that contains hundreds of bioinformatics tools and their dependencies. The ORCA image and accompanying server infrastructure provide a comprehensive bioinformatics environment for education and research. The ORCA environment on a server is implemented using Docker containers, but without requiring users to interact directly with Docker, suitable for novices who may not yet have familiarity with managing containers. ORCA has been used successfully to provide a private bioinformatics environment to external collaborators at a large genome institute, for teaching an undergraduate class on bioinformatics targeted at biologists, and to provide a ready-to-go bioinformatics suite for a hackathon...
April 20, 2019: Bioinformatics
Fabio Cunial, Jarno Alanko, Djamal Belazzougui
MOTIVATION: Markov models with contexts of variable length are widely used in bioinformatics for representing sets of sequences with similar biological properties. When models contain many long contexts, existing implementations are either unable to handle genome-scale training datasets within typical memory budgets, or they are optimized for specific model variants and are thus inflexible. RESULTS: We provide practical, versatile representations of variable-order Markov models and of interpolated Markov models, that support a large number of context-selection criteria, scoring functions, probability smoothing methods, and interpolations, and that take up to four times less space than previous implementations based on the suffix array, regardless of the number and length of contexts, and up to ten times less space than previous trie-based representations, or more, while matching the size of related, state-of-the-art data structures from Natural Language Processing...
April 20, 2019: Bioinformatics
Yuk Yee Leung, Otto Valladares, Yi-Fan Chou, Han-Jen Lin, Amanda B Kuzma, Laura Cantwell, Liming Qu, Prabhakaran Gangadharan, William J Salerno, Gerard D Schellenberg, Li-San Wang
April 19, 2019: Bioinformatics
Gregory Kucherov
MOTIVATION: Although modern high-throughput biomolecular technologies produce various types of data, biosequence data remains at the core of bioinformatic analyses. However, computational techniques for dealing with this data evolved dramatically. RESULTS: In this bird's-eye review, we overview the evolution of main algorithmic techniques for comparing and searching biological sequences. We highlight key algorithmic ideas emerged in response to several interconnected factors: shifts of biological analytical paradigm, advent of new sequencing technologies, and a substantial increase in size of the available data...
April 17, 2019: Bioinformatics
Peng Ni, Neng Huang, Zhi Zhang, De-Peng Wang, Fan Liang, Yu Miao, Chuan-Le Xiao, Feng Luo, Jianxin Wang
MOTIVATION: The Oxford Nanopore sequencing enables to directly detect methylation states of bases in DNA from reads without extra laboratory techniques. Novel computational methods are required to improve the accuracy and robustness of DNA methylation state prediction using Nanopore reads. RESULTS: In this study, we develop DeepSignal, a deep learning method to detect DNA methylation states from Nanopore sequencing reads. Testing on Nanopore reads of Homo sapiens (H...
April 17, 2019: Bioinformatics
Jasmijn A Baaijens, Alexander Schönhuth
MOTIVATION: Haplotype aware genome assembly plays an important role in genetics, medicine, and various other disciplines, yet generation of haplotype-resolved de novo assemblies remains a major challenge. Beyond distinguishing between errors and true sequential variants, one needs to assign the true variants to the different genome copies. Recent work has pointed out that the enormous quantities of traditional NGS read data have been greatly underexploited in terms of haplotig computation so far, which reflects that methodology for reference independent haplotig computation has not yet reached maturity...
April 17, 2019: Bioinformatics
Justin D Finkle, Neda Bagheri
MOTIVATION: To understand the regulatory pathways underlying diseases, studies often investigate the differential gene expression between genetically or chemically differing cell populations. Differential expression analysis identifies global changes in transcription and enables the inference of functional roles of applied perturbations. This approach has transformed the discovery of genetic drivers of disease and possible therapies. However, differential expression analysis does not provide quantitative predictions of gene expression in untested conditions...
April 17, 2019: Bioinformatics
Yuansheng Liu, Leo Yu Zhang, Jinyan Li
MOTIVATION: Detection of maximal exact matches (MEMs) between two long sequences is a fundamental problem in pairwise reference-query genome comparisons. To efficiently compare larger and larger genomes, reducing the number of indexed k-mers as well as the number of query k-mers has been adopted as a mainstream approach which saves the computational resources by avoiding a significant number of unnecessary matches. RESULTS: Under this framework, we proposed a new method to detect all MEMs from a pair of genomes...
April 17, 2019: Bioinformatics
Gabriele Orlando, Daniele Raimondi, Francesco Tabaro, Francesco Codicé, Yves Moreau, Wim Vranken
MOTIVATION: Eukaryotic cells contain different membrane-delimited compartments, which are crucial for the biochemical reactions necessary to sustain cell life. Recent studies showed that cells can also trigger the formation of membraneless organelles composed by phase-separated proteins to respond to various stimuli. These condensates provide new ways to control the reactions and phase-separation proteins (PSPs) are thus revolutionising how cellular organization is conceived. The small number of experimentally validated proteins, and the difficulty in discovering them, remain bottlenecks in PSPs research...
April 17, 2019: Bioinformatics
Longendri Aguilera-Mendoza, Yovani Marrero-Ponce, Jesus A Beltran, Roberto Tellez Ibarra, Hugo A Guillen-Ramirez, Carlos A Brizuela
MOTIVATION: Bioactive peptides have gained great attention in the academy and pharmaceutical industry since they play an important role in human health. However, the increasing number of bioactive peptide databases is causing the problem of data redundancy and duplicated efforts. Even worse is the fact that the available data is non-standardized and often dirty with data entry errors. Therefore, there is a need for a unified view that enables a more comprehensive analysis of the information on this topic residing at different sites...
April 17, 2019: Bioinformatics
Leyi Wei, Chen Zhou, Ran Su, Quan Zou
MOTIVATION: Prediction of therapeutic peptides is critical for the discovery of novel and efficient peptide-based therapeutics. Computational methods, especially machine learning based methods, have been developed for addressing this need. However, most of existing methods are peptide-specific; currently, there is no generic predictor for multiple peptide types. Moreover, it is still challenging to extract informative feature representations from the perspective of primary sequences. RESULTS: In this study, we have developed PEPred-Suite, a bioinformatics tool for the generic prediction of therapeutic peptides...
April 17, 2019: Bioinformatics
Ali M Yazbeck, Peter F Stadler, Kifah Tout, Jörg Fallmann
MOTIVATION: MicroRNAs form an important class of RNA regulators that has been studied extensively. The miRBase and Rfam database provide rich, frequently updated information on both pre-miRNAs and their mature forms. These data sources, however, rely on individual data submission and thus are neither complete nor consistent in their coverage across different miRNA families. Quantitative studies of miRNA evolution therefore are difficult or impossible on this basis. RESULTS: We present here a workflow and a corresponding implementation, MIRfix, that automatically curates miRNA datasets by improving alignments of their precursors, the consistency of the annotation of mature miR and miR* sequence, and the phylogenetic coverage...
April 16, 2019: Bioinformatics
Steven Monger, Michael Troup, Eddie Ip, Sally L Dunwoodie, Eleni Giannoulatou
MOTIVATION: In silico prediction tools are essential for identifying variants which create or disrupt cis splicing motifs. However, there are limited options for genome-scale discovery of splice-altering variants. RESULTS: We have developed Spliceogen, a highly scalable pipeline integrating predictions from some of the individually best performing models for splice motif prediction: MaxEntScan, GeneSplicer, ESRseq and Branchpointer. AVAILABILITY: Spliceogen is available as a command line tool which accepts VCF/BED inputs and handles both single nucleotide variants (SNVs) and indels (https://github...
April 16, 2019: Bioinformatics
Taro Matsutani, Yuki Ueno, Tsukasa Fukunaga, Michiaki Hamada
BACKGROUND: A cancer genome includes many mutations derived from various mutagens and mutational processes, leading to specific mutation patterns. It is known that each mutational process leads to characteristic mutations, and when a mutational process has preferences for mutations, this situation is called a "mutation signature." Identification of mutation signatures is an important task for elucidation of carcinogenic mechanisms. In previous studies, analyses with statistical approaches (e...
April 16, 2019: Bioinformatics
Roberto Semeraro, Alberto Magi
MOTIVATION: The recent technological improvement of Oxford Nanopore sequencing pushed the throughput of these devices to 10-20 Gb allowing the generation of millions of reads. For these reasons, the availability of fast software packages for evaluating experimental quality by generating highly informative and interactive summary plots is of fundamental importance. RESULTS: We developed PyPore, a three module python toolbox designed to handle raw FAST5 files from quality checking to alignment to a reference genome and to explore their features through the generation of browsable HTML files...
April 16, 2019: Bioinformatics
Sijie Chen, Yixin Chen, Fengzhu Sun, Michael S Waterman, Xuegong Zhang
MOTIVATION: Detecting sequences containing repetitive regions is a basic bioinformatics task with many applications. Several methods have been developed for various types of repeat detection tasks. An efficient generic method for detecting most types of repetitive sequences is still desirable. Inspired by the excellent properties and successful applications of the D2 family of statistics in comparative analyses of genomic sequences, we developed a new statistic D2R that can efficiently discriminate sequences with or without repetitive regions...
April 16, 2019: Bioinformatics
Ramesh Rajaby, Wing-Kin Sung
MOTIVATION: Structural variations (SV) are large scale mutations in a genome; although less frequent than point mutations, due to their large size they are responsible for more heritable differences between individuals. Two prominent classes of SVs are deletions and tandem duplications. They play important roles in many devastating genetic diseases, such as Smith-Magenis syndrome, Potocki-Lupski syndrome and Williams-Beuren syndrome.Since paired-end whole genome sequencing data has become widespread and affordable, reliably calling deletions and tandem duplications has been a major target in bioinformatics; unfortunately, the problem is far from being solved, since existing solutions often offer poor results when applied to real data...
April 15, 2019: Bioinformatics
