Most recent papers in the journal International Journal of Data Mining and Bioinformatics

#1

JOURNAL ARTICLE

DiffGRN: differential gene regulatory network analysis.

Youngsoon Kim, Jie Hao, Yadu Gautam, Tesfaye B Mersha, Mingon Kang

Identification of differential gene regulators with significant changes under disparate conditions is essential to understand complex biological mechanism in a disease. Differential Network Analysis (DiNA) examines different biological processes based on gene regulatory networks that represent regulatory interactions between genes with a graph model. While most studies in DiNA have considered correlation-based inference to construct gene regulatory networks from gene expression data due to its intuitive representation and simple implementation, the approach lacks in the representation of causal effects and multivariate effects between genes...

31114627

2018: International Journal of Data Mining and Bioinformatics

#2

JOURNAL ARTICLE

Integration of multi-omics data for integrative gene regulatory network inference.

Neda Zarayeneh, Euiseong Ko, Jung Hun Oh, Sang Suh, Chunyu Liu, Jean Gao, Donghyun Kim, Mingon Kang

Gene regulatory networks provide comprehensive insights and indepth understanding of complex biological processes. The molecular interactions of gene regulatory networks are inferred from a single type of genomic data, e.g., gene expression data in most research. However, gene expression is a product of sequential interactions of multiple biological processes, such as DNA sequence variations, copy number variations, histone modifications, transcription factors, and DNA methylations. The recent rapid advances of high-throughput omics technologies enable one to measure multiple types of omics data, called 'multi-omics data', that represent the various biological processes...

29354189

2017: International Journal of Data Mining and Bioinformatics

#3

JOURNAL ARTICLE

The development of non-coding RNA ontology.

Jingshan Huang, Karen Eilbeck, Barry Smith, Judith A Blake, Dejing Dou, Weili Huang, Darren A Natale, Alan Ruttenberg, Jun Huan, Michael T Zimmermann, Guoqian Jiang, Yu Lin, Bin Wu, Harrison J Strachan, Nisansa de Silva, Mohan Vamsi Kasukurthi, Vikash Kumar Jha, Yongqun He, Shaojie Zhang, Xiaowei Wang, Zixing Liu, Glen M Borchert, Ming Tan

Identification of non-coding RNAs (ncRNAs) has been significantly improved over the past decade. On the other hand, semantic annotation of ncRNA data is facing critical challenges due to the lack of a comprehensive ontology to serve as common data elements and data exchange standards in the field. We developed the Non-Coding RNA Ontology (NCRO) to handle this situation. By providing a formally defined ncRNA controlled vocabulary, the NCRO aims to fill a specific and highly needed niche in semantic annotation of large amounts of ncRNA biological and clinical data...

27990175

2016: International Journal of Data Mining and Bioinformatics

#4

JOURNAL ARTICLE

Learning multiple distributed prototypes of semantic categories for named entity recognition.

Aron Henriksson

The scarcity of large labelled datasets comprising clinical text that can be exploited within the paradigm of supervised machine learning creates barriers for the secondary use of data from electronic health records. It is therefore important to develop capabilities to leverage the large amounts of unlabelled data that, indeed, tend to be readily available. One technique utilises distributional semantics to create word representations in a wholly unsupervised manner and uses existing training data to learn prototypical representations of predefined semantic categories...

26547986

2015: International Journal of Data Mining and Bioinformatics

#5

JOURNAL ARTICLE

Weighted fusion regularisation and predicting microbial interactions with vector autoregressive model.

Yan Wang, Tingting He, Xingpeng Jiang, Jie Yuan, Xianjun Shen

In this paper, we develop a novel regularisation method for MVAR via weighted fusion which considers the correlation among variables. In theory, we discuss the grouping effect of weighted fusion regularisation for linear models. By virtue of the probability method, we show that coefficients corresponding to highly correlated predictors have small differences. A quantitative estimate for such small differences is given regardless of the coefficients signs. The estimate is also improved when consider empirical approximation error if the model fit the data well...

26547985

2015: International Journal of Data Mining and Bioinformatics

#6

JOURNAL ARTICLE

Application of consensus string matching in the diagnosis of allelic heterogeneity involving transposition mutation.

Fatema Tuz Zohora, M Sohel Rahman

In this paper, an algorithm is proposed that detects the existence of a common ancestor gene sequence for non-overlapping transposition metric given two input DNA sequences. We consider two cases: fixed length transposition and all length transposition. For the first one, the algorithm has the time complexity of O(n3), where n is the length of input sequences. In case of all length transposition, theoretical worst case time complexity of the algorithm is proven to be O(n4). However, practically the worst case and the average case time complexity for all length transposition are found to be O(n3) and O(n2) respectively...

26547984

2015: International Journal of Data Mining and Bioinformatics

#7

JOURNAL ARTICLE

Genome-wide discovery of miRNAs using ensembles of machine learning algorithms and logistic regression.

Benjamin Ulfenborg, Karin Klinga-Levan, Björn Olsson

In silico prediction of novel miRNAs from genomic sequences remains a challenging problem. This study presents a genome-wide miRNA discovery software package called GenoScan and evaluates two hairpin classification methods. These methods, one ensemble-based and one using logistic regression were benchmarked along with 15 published methods. In addition, the sequence-folding step is addressed by investigating the impact of secondary structure prediction methods and the choice of input sequence length on prediction performance...

26547983

2015: International Journal of Data Mining and Bioinformatics

#8

JOURNAL ARTICLE

In silico identification and functional annotation of yeast E3 ubiquitin ligase Rsp5 substrates.

Xiaofeng Song, Lizhen Hu, Ping Han, Xuejiang Guo, Jiahao Sha

Rsp5, E3 ligases conserved from yeast to mammals, plays a key role in diverse processes in yeast. However, many of Rsp5 substrates are still unclear. Therefore we proposed an in silico method to recognise new substrates of Rsp5. To investigate the molecular determinants that affect the interaction between Rsp5 and its substrate, we have systematically analysed many features that perhaps correlated with the Rsp5 substrate recognition. It is found that PPxY motif, transmembrane region, disorder region and N-linked glycosylation modification are the most important features for substrate recognition...

26547982

2015: International Journal of Data Mining and Bioinformatics

#9

JOURNAL ARTICLE

Towards rule-based metabolic databases: a requirement analysis based on KEGG.

Stephan Richter, Ingo Fetzer, Martin Thullner, Florian Centler, Peter Dittrich

Knowledge of metabolic processes is collected in easily accessable online databases which are increasing rapidly in content and detail. Using these databases for the automatic construction of metabolic network models requires high accuracy and consistency. In this bipartite study we evaluate current accuracy and consistency problems using the KEGG database as a prominent example and propose design principles for dealing with such problems. In the first half, we present our computational approach for classifying inconsistencies and provide an overview of the classes of inconsistencies we identified...

26547981

2015: International Journal of Data Mining and Bioinformatics

#10

JOURNAL ARTICLE

A fast Boyer-Moore type pattern matching algorithm for highly similar sequences.

Nadia Ben Nsira, Thierry Lecroq, Mourad Elloumi

In the last decade, biology and medicine have undergone a fundamental change: next generation sequencing (NGS) technologies have enabled to obtain genomic sequences very quickly and at small costs compared to the traditional Sanger method. These NGS technologies have thus permitted to collect genomic sequences (genes, exomes or even full genomes) of individuals of the same species. These latter sequences are identical to more than 99%. There is thus a strong need for efficient algorithms for indexing and performing fast pattern matching in such specific sets of sequences...

26547980

2015: International Journal of Data Mining and Bioinformatics

#11

JOURNAL ARTICLE

Cuckoo search optimisation for feature selection in cancer classification: a new approach.

C Gunavathi, K Premalatha

Cuckoo Search (CS) optimisation algorithm is used for feature selection in cancer classification using microarray gene expression data. Since the gene expression data has thousands of genes and a small number of samples, feature selection methods can be used for the selection of informative genes to improve the classification accuracy. Initially, the genes are ranked based on T-statistics, Signal-to-Noise Ratio (SNR) and F-statistics values. The CS is used to find the informative genes from the top-m ranked genes...

26547979

2015: International Journal of Data Mining and Bioinformatics

#12

JOURNAL ARTICLE

PMCR-Miner: parallel maximal confident association rules miner algorithm for microarray data set.

Wael Zakaria, Yasser Kotb, Fayed F M Ghaleb

The MCR-Miner algorithm is aimed to mine all maximal high confident association rules form the microarray up/down-expressed genes data set. This paper introduces two new algorithms: IMCR-Miner and PMCR-Miner. The IMCR-Miner algorithm is an extension of the MCR-Miner algorithm with some improvements. These improvements implement a novel way to store the samples of each gene into a list of unsigned integers in order to benefit using the bitwise operations. In addition, the IMCR-Miner algorithm overcomes the drawbacks faced by the MCR-Miner algorithm by setting some restrictions to ignore repeated comparisons...

26547978

2015: International Journal of Data Mining and Bioinformatics

#13

JOURNAL ARTICLE

Sequence based human leukocyte antigen gene prediction using informative physicochemical properties.

Watshara Shoombuatong, Panuwat Mekha, Jeerayut Chaijaruwanich

Prediction of different classes within the human leukocyte antigen (HLA) gene family can provide insight into the human immune system and its response to viral pathogens. Therefore, it is desirable to develop an efficient and easily interpretable method for predicting HLA gene class compared to existing methods. We investigated the HLA gene prediction problem as follows: (a) establishing a dataset (HLA262) such that the sequence identity of the complete HLA dataset was reduced to 30%; (b) proposing a feature set of informative physicochemical properties that cooperate with SVM (named HLAPred) to achieve high accuracy and sensitivity (90...

26547977

2015: International Journal of Data Mining and Bioinformatics

#14

JOURNAL ARTICLE

Wavelet-based gene selection method for survival prediction in diffuse large B-cell lymphomas patients.

Maryam Farhadian, Hossein Mahjub, Abbas Moghimbeigi, Paulo J G Lisboa, Jalal Poorolajal, Muharram Mansoorizadeh

Microarray technology allows simultaneous measurements of expression levels for thousands of genes. An important aspect of microarray studies includes the prediction of patient survival based on their gene expression profile. This naturally calls for the use of a dimension reduction procedure together with the survival prediction model. In this study, a new method based on wavelet transform for survival-relevant gene selection is presented. Cox proportional hazard model is typically used to build prediction model for patients' survival using the selected genes...

26547976

2015: International Journal of Data Mining and Bioinformatics

#15

JOURNAL ARTICLE

Orthogonal projection correction for confounders in biological data classification.

Limin Li, Shuqin Zhang

The existence of confounders such as population structure in genome-wide association study makes it difficult to apply machine learning methods directly to solve biological problems. It is still unclear how to effectively correct confounders. In this work, we propose an Orthogonal Projection Correction (OPC) method to correct confounders. This is achieved by orthogonally decomposing each feature to a confounding component and a non-confounding component, such that the original data can be best reconstructed by only the non-confounding components of features...

26547975

2015: International Journal of Data Mining and Bioinformatics

#16

JOURNAL ARTICLE

miRNA target recognition using features of suboptimal alignments.

Ali Katanforoush, Ehsan Mahdavi

MicroRNAs (miRNAs) are a class of short RNA molecules that regulate gene expression by binding directly to messenger RNAs. Conventional approaches to miRNA target prediction estimate the accessibility of target sites and the strength of the binding miRNA by finding optimums of some energy models, which involves O(n3) computations. Alternatively, we narrow down potential binding sites of miRNAs to suboptimal hits of a pairwise alignment algorithm called Fitting Alignment in O(n2). We invoke a same algorithm, once for all candidate sites to measure the site accessibilities...

26547974

2015: International Journal of Data Mining and Bioinformatics

#17

JOURNAL ARTICLE

Analysing large biological data sets with an improved algorithm for MIC.

Shuliang Wang, Yiping Zhao

The computational framework used the traditional similarity measures to find out the significant relationships in biological annotations. But its prerequisites that the biological annotations do not cooccur with each other is particular. To overcome it, in this paper a new method Improved Algorithm for Maximal Information Coefficient (IAMIC) is suggested to discover the hidden regularities between biological annotations. IAMIC approximates a novel similarity coefficient on maximal information coefficient with generality and equitability, by bettering axis partition through quadratic optimisation instead of violence search...

26547973

2015: International Journal of Data Mining and Bioinformatics

#18

JOURNAL ARTICLE

Exploiting multi-layered vector spaces for signal peptide detection.

Tom Johnsten, Laura Fain, Leanna Fain, Ryan G Benton, Ethan Butler, Lewis Pannell, Ming Tan

Analysing and classifying sequences based on similarities and differences is a mathematical problem of escalating relevance and importance in many scientific disciplines. One of the primary challenges in applying machine learning algorithms to sequential data, such as biological sequences, is the extraction and representation of significant features from the data. To address this problem, we have recently developed a representation, entitled Multi-Layered Vector Spaces (MLVS), which is a simple mathematical model that maps sequences into a set of MLVS...

26547972

2015: International Journal of Data Mining and Bioinformatics

#19

JOURNAL ARTICLE

A graph-based integrative method of detecting consistent protein functional modules from multiple data sources.

Yuan Zhang, Yue Cheng, Liang Ge, Nan Du, Kebin Jia, Aidong Zhang

Many clustering methods have been developed to identify functional modules in Protein-Protein Interaction (PPI) networks but the results are far from satisfaction. To overcome the noise and incomplete problems of PPI networks and find more accurate and stable functional modules, we propose an integrative method, bipartite graph-based Non-negative Matrix Factorisation method (BiNMF), in which we adopt multiple biological data sources as different views that describe PPIs. Specifically, traditional clustering models are adopted as preliminary analysis of different views of protein functional similarity...

26547971

2015: International Journal of Data Mining and Bioinformatics

#20

JOURNAL ARTICLE

An effective hybrid approach of gene selection and classification for microarray data based on clustering and particle swarm optimization.

Fei Han, Shanxiu Yang, Jian Guan

In this paper, a hybrid approach based on clustering and Particle Swarm Optimisation (PSO) is proposed to perform gene selection and classification for microarray data. In the new method, firstly, genes are partitioned into a predetermined number of clusters by K-means method. Since the genes in each cluster have much redundancy, Max-Relevance Min-Redundancy (mRMR) strategy is used to reduce redundancy of the clustered genes. Then, PSO is used to perform further gene selection from the remaining clustered genes...

26547970

2015: International Journal of Data Mining and Bioinformatics

Use the journals feature with a free QxMD account.

International Journal of Data Mining and Bioinformatics

Save your favorite articles in one place with a free QxMD account.

Read

Search Tips