Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics

John H Phan, Chang-Feng Quo, May D Wang
Progress in Brain Research 2006, 158: 83-108
The goal of this chapter is to introduce some of the available computational methods for expression analysis. Genomic and proteomic experimental techniques are briefly discussed to help the reader understand these methods and results better in context with the biological significance. Furthermore, a case study is presented that will illustrate the use of these analytical methods to extract significant biomarkers from high-throughput microarray data. Genomic and proteomic data analysis is essential for understanding the underlying factors that are involved in human disease. Currently, such experimental data are generally obtained by high-throughput microarray or mass spectrometry technologies among others. The sheer amount of raw data obtained using these methods warrants specialized computational methods for data analysis. Biomarker discovery for neurological diagnosis and prognosis is one such example. By extracting significant genomic and proteomic biomarkers in controlled experiments, we come closer to understanding how biological mechanisms contribute to neural degenerative diseases such as Alzheimers' and how drug treatments interact with the nervous system. In the biomarker discovery process, there are several computational methods that must be carefully considered to accurately analyze genomic or proteomic data. These methods include quality control, clustering, classification, feature ranking, and validation. Data quality control and normalization methods reduce technical variability and ensure that discovered biomarkers are statistically significant. Preprocessing steps must be carefully selected since they may adversely affect the results of the following expression analysis steps, which generally fall into two categories: unsupervised and supervised. Unsupervised or clustering methods can be used to group similar genomic or proteomic profiles and therefore can elucidate relationships within sample groups. These methods can also assign biomarkers to sub-groups based on their expression profiles across patient samples. Although clustering is useful for exploratory analysis, it is limited due to its inability to incorporate expert knowledge. On the other hand, classification and feature ranking are supervised, knowledge-based machine learning methods that estimate the distribution of biological expression data and, in doing so, can extract important information about these experiments. Classification is closely coupled with feature ranking, which is essentially a data reduction method that uses classification error estimation or other statistical tests to score features. Biomarkers can subsequently be extracted by eliminating insignificantly ranked features. These analytical methods may be equally applied to genetic and proteomic data. However, because of both biological differences between the data sources and technical differences between the experimental methods used to obtain these data, it is important to have a firm understanding of the data sources and experimental methods. At the same time, regardless of the data quality, it is inevitable that some discovered biomarkers are false positives. Thus, it is important to validate discovered biomarkers. The validation process may be slow; yet, the overall biomarker discovery process is significantly accelerated due to initial feature ranking and data reduction steps. Information obtained from the validation process may also be used to refine data analysis procedures for future iteration. Biomarker validation may be performed in a number of ways - bench-side in traditional labs, web-based electronic resources such as gene ontology and literature databases, and clinical trials.

Full Text Links

Find Full Text Links for this Article


You are not logged in. Sign Up or Log In to join the discussion.

Related Papers

Remove bar
Read by QxMD icon Read

Save your favorite articles in one place with a free QxMD account.


Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"