Gene expression analysis in clear cell renal cell carcinoma using gene set enrichment analysis for biostatistical management

Matthias Maruschke, D Reuter, D Koczan, O W Hakenberg, H-J Thiesen
BJU International 2011, 108 (2): E29-35

OBJECTIVE: To improve the workflow for standardizing the statistical interpretation provides an opportunity for the analysis of gene expression in clear cell renal cell carcinoma (ccRCC). RCC as a solid tumour entity represents a very suitable tumour model for such investigations. Although it is possible to investigate expression profiles by microarray technologies, the main problem is how to adequately interpret the accumulated mass of data derived from microarray technologies. There is a clear lack of a defined, consistent and comparable biostatistical analysis system, with no specific biostatistical standard methodology being available to compare the results of microarray analyses. We used the gene set enrichment analysis (GSEA) method to analyze microarray data from RCC tissue. The present study aimed to analyze differential expression profiles and establish biomarkers suitable for prognostication at the time of renal surgery by comparing RCC patients with long-term survival data against RCC samples of patients with poorly differentiated (grade 3) RCC, concomitant metastatic disease and short survival.

PATIENTS AND METHODS: In the present study, a total of 29 ccRCC fresh-frozen tissue samples were used; 14 samples from grade 1 (G1) RCC patients without metastatic disease and 15 from grade 3 (G3) RCC patients with synchronous metastatic disease. Expression profiling was performed with the Human Genome U133 Plus 2.0 Array (Affymetrix Corp., Santa Clara, CA, USA). Clinical data and long-term follow-up were obtained for all patients. The primary probe level analysis was performed using the Affymetrix MAS 5 algorithm. Further statistical processing was carried out by GSEA, using the Molecular Signatures Database, MSigDB ( After selecting gene sets with the highest leading edge subsets, a cluster and a further analyses based on MSigDB data bank analysis was performed.

RESULTS: In total, 15 poorly G3 ccRCC, 14 well differentiated G1 ccRCC and 14 normal renal tissue samples were analyzed for comparative gene expression profiling. There were 12 of 15 G3 ccRCC patients who had synchronous metastatic disease at the time of surgery (pN+ and/or distant metastases: pN+ only = 4, M+ only = 11 and pN+M+ = 3). The GSEA identified 700 gene sets. Out of these, 120 sets with the highest leading edge subset were selected monitored by hierarchical clustering G1 vs G3. Comparative analysis using the the MSigDB data bank for pathway network identified 16 gene sets that were differentially strongly over- or underexpressed in G3 vs G1 tumours and are involved in various aspects of tumour physiology, such as metastases and cell motility, signalling and cell proliferation, as well as gene products that are involved in the building of the extracellular matrix and as cell surface markers.

CONCLUSIONS: We analyzed microarray data of gene expression in ccRCC comparing poorly differentiated and well differentiated tumour tissue samples. Using GSEA, we found a number of genes set candidates relevant to biological network processes with high complexity; conspicuously, these comprised members of the interleukin- and chemokine-family, cyclin-dependent kinases, angiogenic growth factors and transcriptional factors. This suggests that, in poorly differentiated aggressive ccRCC, there may be a limited number of gene sets that are responsible for the very aggressive biological behaviour. This comparison performed at a gene set level enables the identification of such congruency between different gene sets and whole data sets with respect to a specific biological question. GSEA embedded in the statistical workflow procedure for the suitable preparation of expression data may improve the analysis and avoid missing changes at the molecular level. A systematic approach such as GSEA is clearly needed to analyze raw data from microarray analyses, although these data can only be descriptive and the mass of raw data is derived from a relatively small number of tissue samples. However, consistent alterations of gene expression found in specific tumour entities may allow a better understanding of certain aspects of specific tumour biology. Therefore, the molecular characterization of individual tumours may potentially be useful for the better individual assessment of prognosis and, finally, the identification of biomarkers and targets of specific treatments may eventually help to improve treatment.

Full Text Links

Find Full Text Links for this Article


You are not logged in. Sign Up or Log In to join the discussion.

Related Papers

Remove bar
Read by QxMD icon Read

Save your favorite articles in one place with a free QxMD account.


Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"