Add like
Add dislike
Add to saved papers

Speeding up the discovery of combinations of differentially expressed genes for disease prediction and classification.

BACKGROUND AND OBJECTIVE: Finding combinations (i.e., pairs, or more generally, q-tuples with q ≥ 2) of genes whose behavior as a group differs significantly between two classes has received a lot of attention in the quest for the discovery of simple, accurate, and easily interpretable decision rules for disease classification and prediction. For example, the Top Scoring Pair (TSP) method seeks to find pairs of genes so that the probability of the reversal of the relative ranking of the expression levels of the genes in the two classes is maximized. The computational cost of finding a q-tuple of genes that scores highest under a given metric is O(Gq ), where G is the total number of genes. This cost is often problematic or prohibitive in practice (even for q=2), as the number of genes G is often in the order of tens of thousands.

METHODS: In this paper, we show that this computational cost can be significantly reduced by excluding from consideration genes whose behavior is almost identical in the two classes and therefore their inclusion in any q-tuple is rather non-informative. Our criterion for the exclusion of genes is supported by a statistically robust metric, the Area Under the Curve (AUC) of the corresponding Receiver Operating Characteristic (ROC) curve. By filtering out genes whose AUC value is below a user-chosen threshold, as determined by a procedure that we describe in the paper, dramatic reductions in the run times are obtained while maintaining the same classification accuracy.

RESULTS: We have experimentally verified the gains of this approach on several case studies involving ovarian, colon, leukemia, breast and prostate cancers, and diffuse large b-cell lymphoma.

CONCLUSIONS: The proposed method is not only faster (for example, we observed an average 78.65% reduction over the run time of TSP) while maintaining the same classification accuracy, but it can even result in better classification accuracy due to its inherent ability to avoid the so-called "pivot" (non-informative) genes that may intrude in q-tuples chosen otherwise.

Full text links

We have located links that may give you full text access.
Can't access the paper?
Try logging in through your university/institutional subscription. For a smoother one-click institutional access experience, please use our mobile app.

Related Resources

For the best experience, use the Read mobile app

Mobile app image

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices Toggle icon

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app