[Transcriptomes for serial analysis of gene expression]

Jacques Marti, David Piquemal, Laurent Manchon, Thérèse Commes
Journal de la Société de Biologie 2002, 196 (4): 303-7
The availability of the sequences for whole genomes is changing our understanding of cell biology. Functional genomics refers to the comprehensive analysis, at the protein level (proteome) and at the mRNA level (transcriptome) of all events associated with the expression of whole sets of genes. New methods have been developed for transcriptome analysis. Serial Analysis of Gene Expression (SAGE) is based on the massive sequential analysis of short cDNA sequence tags. Each tag is derived from a defined position within a transcript. Its size (14 bp) is sufficient to identify the corresponding gene and the number of times each tag is observed provides an accurate measurement of its expression level. Since tag populations can be widely amplified without altering their relative proportions, SAGE may be performed with minute amounts of biological extract. Dealing with the mass of data generated by SAGE necessitates computer analysis. A software is required to automatically detect and count tags from sequence files. Criterias allowing to assess the quality of experimental data can be included at this stage. To identify the corresponding genes, a database is created registering all virtual tags susceptible to be observed, based on the present status of the genome knowledge. By using currently available database functions, it is easy to match experimental and virtual tags, thus generating a new database registering identified tags, together with their expression levels. As an open system, SAGE is able to reveal new, yet unknown, transcripts. Their identification will become increasingly easier with the progress of genome annotation. However, their direct characterization can be attempted, since tag information may be sufficient to design primers allowing to extend unknown sequences. A major advantage of SAGE is that, by measuring expression levels without reference to an arbitrary standard, data are definitively acquired and cumulative. All publicly available data can thus be stored in a unique database, facilitating whole-genome analysis of differential expression between cell types, normal and diseased samples, or samples with and without drug treatment. SAGE data are readily amenable to statistical comparisons, allowing to determine the level of confidence of the observed variations. A major limitation of SAGE is that, because each analysis is obligatory performed on the whole set of expressed genes, it can hardly be performed on multiple samples, for example in kinetics studies or to compare the effects of large numbers of drugs. To overcome this limitation, high-throughput detection of a subset of mRNAs is more rapidly performed by parallel hybridization of mRNAs on arrays of nucleic acids immobilized on solid supports. From this point of view, a SAGE platform is a powerful instrument for selecting the most informative subset of genes, assembling them to design microarrays dedicated to a specific problem and calibrating measurement by comparison with a standard cell model for which SAGE data are available. This approach is an attractive alternative to strategies based exclusively on pangenomic arrays. A very large amount of SAGE data are already available and the problem is now to extract their biological meaning. Knowledge on metabolic pathways is already organized so that its successful integration in a SAGE platform can be undertaken. For other cell components and pathways, the problem lies on the lack of controlled vocabulary to describe gene activities, starting form a clear definition of the concept of biological function itself. Progress in gene and cell ontology is expected to facilitate computer-based extraction of biological knowledge from existing and forthcoming SAGE data.

Full Text Links

Find Full Text Links for this Article


You are not logged in. Sign Up or Log In to join the discussion.

Trending Papers

Remove bar
Read by QxMD icon Read

Save your favorite articles in one place with a free QxMD account.


Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"