Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease

Personalized Medicine Identification
DOI: 10.4137/cin.s16354 Publication Date: 2015-02-10T23:24:26Z
ABSTRACT
Genomic-based studies of disease now involve diverse types data collected on large groups patients. A major challenge facing statistical scientists is how best to combine the data, extract important features, and comprehensively characterize ways in which they affect an individual's course likelihood response treatment. We have developed a survival-supervised latent Dirichlet allocation (survLDA) modeling framework address these challenges. Latent (LDA) models proven extremely effective at identifying themes common across collections text, but applications genomics been limited. Our extends LDA genome by considering each patient as “document” with “text” detailing his/her clinical events genomic state. then further extend allow for supervision time-to-event response. The model enables efficient identification features that co-occur within subgroups, characterizes those features. An application survLDA Cancer Genome Atlas ovarian project identifies informative subgroups showing differential treatment, validation independent cohort demonstrates potential patient-specific inference.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (35)
CITATIONS (0)