- Cancer Genomics and Diagnostics
- Scientific Computing and Data Management
- Advanced Data Storage Technologies
- Distributed and Parallel Computing Systems
- Data Mining Algorithms and Applications
- Advanced Database Systems and Queries
- Peer-to-Peer Network Technologies
- Cloud Computing and Resource Management
- Gene expression and cancer classification
- Bioinformatics and Genomic Networks
- Data Management and Algorithms
- Research Data Management Practices
- Data Quality and Management
- Caching and Content Delivery
- Advanced Topics in Algebra
- Genomics and Phylogenetic Studies
- Software-Defined Networks and 5G
- Network Traffic and Congestion Control
- Genomics and Chromatin Dynamics
- Genetics, Bioinformatics, and Biomedical Research
- Radiomics and Machine Learning in Medical Imaging
- Epigenetics and DNA Methylation
- Data Visualization and Analytics
- Biomedical Text Mining and Ontologies
- Polynomial and algebraic computation
University of Chicago
2014-2024
University of Illinois Urbana-Champaign
1989-2024
Pfizer (United States)
2023
Constitutional Rights Foundation Chicago
2022-2023
Creative Commons
2022
Open Geospatial Consortium
2022
Rush University Medical Center
2022
NorthShore University HealthSystem
2022
Response Biomedical (Canada)
2022
University of Miami
2021
The Genomic Data Commons will initially house raw genomic data and diagnostic, histologic, clinical outcome from National Cancer Institute–funded projects. A harmonization process align sequencing to the genome identify mutations alterations.
From Genome to Regulatory Networks For biologists, having a genome in hand is only the beginning—much more investigation still needed characterize how used help produce functional organism (see Perspective by Blaxter ). In this vein, Gerstein et al. (p. 1775 ) summarize for Caenorhabditis elegans genome, and The modENCODE Consortium 1787 Drosophila melanogaster full transcriptome analyses over developmental stages, genome-wide identification of transcription factor binding sites,...
To understand clouds and cloud computing, we must first the two different types of clouds. The author distinguishes between that provide on-demand computing instances those capacity. Cloud doesn't yet have a standard definition, but good working description it is to say clouds, or clusters distributed computers, resources services over network, usually Internet, with scale reliability data center.
Among the adverse mental health consequences of childhood trauma is risk related to development posttraumatic stress disorder (PTSD) in adulthood. Other factors for PTSD, including parental exposure and can also contribute experience child trauma. We examined associations between PTSD 51 adult children Holocaust survivors 41 comparison subjects, consideration PTSD. these variables relation 24-hr urinary cortisol levels. Adult offspring showed significantly higher levels self-reported trauma,...
A prospective sample of 69 healthy adults, age range 18-80 years, was studied with magnetic resonance imaging scans (T2 weighted, 5 mm thick) the entire cranium. Volumes were obtained by a segmentation algorithm that uses proton density and T2 pixel values to correct field inhomogeneities ("shading"). Average (+/- SD) brain volume, excluding cerebellum, 1090.91 ml 114.30; range, 822.19-1363.66), cerebrospinal fluid (CSF) volume 127.91 57.62; 34.00-297.02). Brain higher (by ml) in right...
A large amount of information on the Web is contained in regularly structured objects, which we call data records. Such records are important because they often present essential their host pages, e.g., lists products or services. It useful to mine such order extract from them provide value-added Existing automatic techniques not satisfactory poor accuracies. In this paper, propose a more effective technique perform task. The based two observations about and string matching algorithm....
The Cancer Genome Atlas (TCGA) is one of the largest biorepositories digital histology. Deep learning (DL) models have been trained on TCGA to predict numerous features directly from histology, including survival, gene expression patterns, and driver mutations. However, we demonstrate that these vary substantially across tissue submitting sites in for over 3,000 patients with six cancer subtypes. Additionally, show histologic image differences between can easily be identified DL. Site...
The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical genomic data through both harmonized aggregation federated approaches. decreasing cost sequencing (along with other genome-wide molecular assays) increasing evidence its utility will soon drive generation sequence from tens millions humans, levels diversity. In this perspective, we present GA4GH strategies addressing major challenges revolution. We...
Transmission Control Protocol (TCP) is used by various applications to achieve reliable data transfer. TCP was originally designed for unreliable networks. With the emergence of high-speed wide area networks improvements have been applied reduce latency and improved bandwidth. The improvement achieved having system administrators tune network can take a considerable amount time. This paper introduces PSockets (Parallel Sockets), library that achieves an equivalent performance without manual...
We introduce Tukey and scagnostics develop graphtheoretic methods for implementing their procedure on large datasets.
Chromatin immunoprecipitation (ChIP), coupled with massively parallel short-read sequencing (seq) is used to probe chromatin dynamics. Although there are many algorithms call peaks from ChIP-seq datasets, most tuned either handle punctate sites, such as transcriptional factor binding or broad regions, histone modification marks; few can do both. Other limited in their configurability, performance on large data sets, and ability distinguish closely-spaced peaks. In this paper, we introduce...
We describe the design and implementation of a high performance cloud that we have used to archive, analyze mine large distributed data sets. By cloud, mean an infrastructure provides resources and/or services over Internet. A storage services, while compute services. Sector how it required by Sphere cloud. also programming paradigm supported are designed for analyzing sets using computer clusters connected with wide area networks (for example, 10+ Gb/s). mining application developed Sphere....
Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply, given the right programming model and infrastructure. In this paper, we describe design implementation of Sector storage cloud Sphere compute cloud. By contrast with existing clouds, manage data not only within a centre, but also across geographically distributed centres. Similarly, supports user-defined functions (UDFs) both As special case, MapReduce-style implemented in by...
Obtaining accurate drug response data in large cohorts of cancer patients is very challenging; thus, most pharmacogenomics discovery conducted preclinical studies, typically using cell lines and mouse models. However, these platforms suffer from serious limitations, including small sample sizes. Here, we have developed a novel computational method that allows us to impute clinical genomics sets, such as The Cancer Genome Atlas (TCGA). approach works by creating statistical models relating...