- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Cloud Computing and Resource Management
- RNA Research and Splicing
- RNA modifications and cancer
- Advanced Data Storage Technologies
- Logic, programming, and type systems
- Soil Mechanics and Vehicle Dynamics
- Cancer-related molecular mechanisms research
- Embedded Systems Design Techniques
- Single-cell and spatial transcriptomics
- Protein Structure and Dynamics
- Molecular Biology Techniques and Applications
- Interconnection Networks and Systems
- Distributed systems and fault tolerance
- Algorithms and Data Compression
- Vehicle Dynamics and Control Systems
- Data Mining Algorithms and Applications
- Software Testing and Debugging Techniques
- RNA and protein synthesis mechanisms
- Data Management and Algorithms
- Bioinformatics and Genomic Networks
- Refrigeration and Air Conditioning Technologies
- Lattice Boltzmann Simulation Studies
- Generative Adversarial Networks and Image Synthesis
University of North Carolina at Chapel Hill
2011-2022
University of North Carolina Health Care
2017-2022
Jaguar Land Rover (United Kingdom)
2016-2021
Coventry (United Kingdom)
2021
Tusculum College
2015
National Institutes of Health
2014
University of Kentucky
2010-2012
North Carolina State University
2000-2004
Elon University
1992
University of Wisconsin–Madison
1988
The Cancer Genome Atlas profiled 279 head and neck squamous cell carcinomas (HNSCCs) to provide a comprehensive landscape of somatic genomic alterations. Here we show that human-papillomavirus-associated tumours are dominated by helical domain mutations the oncogene PIK3CA, novel alterations involving loss TRAF3, amplification cycle gene E2F1. Smoking-related HNSCCs demonstrate near universal loss-of-function TP53 CDKN2A inactivation with frequent copy number including 3q26/28 11q13/22. A...
The accurate mapping of reads that span splice junctions is a critical component all analytic techniques work with RNA-seq data. We introduce second generation detection algorithm, MapSplice, whose focus high sensitivity and specificity in the splices as well CPU memory efficiency. MapSplice can be applied to both short (<75 bp) long (≥75 bp). not dependent on site features or intron length, consequently it detect novel canonical non-canonical splices. leverages quality diversity read...
Frequent subgraph mining is an active research topic in the data community. A graph a general model to represent and has been used many domains like cheminformatics bioinformatics. Mining patterns from databases challenging since related operations, such as testing, generally have higher time complexity than corresponding operations on itemsets, sequences, trees, which studied extensively. We propose novel frequent algorithm: FFSM, employs vertical search scheme within algebraic framework we...
The need to integrate several versions of a program into common one arises frequently, but it is tedious and time consuming task programs by hand. To date, the only available tools for assisting with integration are variants text-based differential file comparators; these limited utility because has no guarantees about how that product an behaves compared were integrated. This paper concerns design semantics-based tool automatically integrating versions. main contribution algorithm takes as...
Single cell experiments provide an unprecedented opportunity to reconstruct a sequence of changes in biological process from individual "snapshots" cells. However, nonlinear gene expression changes, genes unrelated the process, and possibility branching trajectories make this challenging problem. We develop SLICER (Selective Locally Linear Inference Cellular Expression Relationships) address these challenges. can infer highly trajectories, select without prior knowledge automatically...
One fundamental challenge for mining recurring subgraphs from semi-structured data sets is the overwhelming abundance of such patterns. In large graph databases, total number frequent can become too to allow a full enumeration using reasonable computational resources. this paper, we propose new algorithm that mines only maximal subgraphs, i.e. are not part any other subgraphs. This may exponentially decrease size output set in best case; our experiments on practical sets, reduces mined...
The need to integrate several versions of a program into common one arises frequently, but it is tedious and time consuming task programs by hand. main contribution this paper an algorithm, called integrate, that takes as input three A, B, Base, where A B are two variants Base. Whenever the changes made Base create do not “interfere” (in sense defined in paper), Integrate produces M integrates B.
Comprehensive sequencing of human cancers has identified recurrent mutations in genes encoding chromatin regulatory proteins. For clear cell renal carcinoma (ccRCC), three the five commonly mutated encode regulators PBRM1, SETD2, and BAP1. How these alter landscape transcriptional program ccRCC or other is not understood. Here, we alterations organization transcript profiles associated with a large cohort primary kidney tumors. By associating variation SETD2 , which encodes enzyme...
The RNA transcriptome varies in response to cellular differentiation as well environmental factors, and can be characterized by the diversity abundance of transcript isoforms. Differential transcription analysis, detection differences between transcriptomes different cells, may improve understanding cell development enable identification biomarkers that classify disease types. availability high-throughput short-read sequencing technologies provides in-depth sampling transcriptome, making it...
Single cell experimental techniques reveal transcriptomic and epigenetic heterogeneity among cells, but how these are related is unclear. We present MATCHER, an approach for integrating multiple types of single measurements. MATCHER uses manifold alignment to infer multi-omic profiles from measurements performed on different cells the same type. Using scM&T-seq sc-GEM data, we confirm that accurately predicts true correlations between DNA methylation gene expression without using known...
Program dependence graphs were introduced by Kuck as an intermediate program representation well suited for performing optimizations, vectorization, and parallelization. There are also additional applications them internal in development environments.
The recent addition of task parallelism to the OpenMP shared memory API allows programmers express concurrency at a high level abstraction and places burden scheduling parallel execution on run-time system. Efficient tasks modern multi-socket multicore systems requires careful consideration an increasingly complex hierarchy, including caches non-uniform access (NUMA) characteristics. In order evaluate strategies, we extended open source Qthreads threading library implement different...
Single cell RNA-seq experiments provide valuable insight into cellular heterogeneity but suffer from low coverage, 3' bias and technical noise. These unique properties of single data make study alternative splicing difficult, thus most studies have restricted analysis transcriptome variation to the gene level. To address these limitations, we developed SingleSplice, which uses a statistical model detect genes whose isoform usage shows biological significantly exceeding noise in population...
Finding recurring residue packing patterns, or spatial motifs, that characterize protein structural families is an important problem in bioinformatics. We apply a novel frequent subgraph mining algorithm to three graph representations of three-dimensional (3D) structure. In each graph, vertex represents amino acid. Vertex-residues are connected by edges using approaches: first, based on simple distance threshold between contact residues; second the Delaunay tessellation from computational...
The recent addition of task parallelism to the OpenMP shared memory API allows programmers express concurrency at a high level abstraction and places burden scheduling parallel execution on run time system. This is welcome development for scientific computing as supercomputer nodes grow "fatter" with multicore manycore processors. But efficient tasks modern multi-socket systems requires careful consideration an increasingly complex hierarchy, including caches NUMA characteristics. In this...
Splice variant neoantigens are a potential source of tumor-specific antigen (TSA) that shared between patients in variety cancers, including acute myeloid leukemia. Current tools for genomic prediction splice demonstrate promise. However, many have not been well validated with simulated and/or wet lab approaches, no studies published presented targeted immunopeptidome mass spectrometry approach designed specifically identification predicted neoantigens.In this study, we describe NeoSplice,...
SMD, a system for interactively steering molecular dynamics calculations of protein molecules, includes computation, visualization, and communication components. Biochemists can "tug" molecules into different shapes by specifying external forces in the graphical interface, which are added to internal representing atomic bonds nonbonded interactions. SMD provides new tool biochemists use exploring structure proposed designs, as well more general applications such model itself. Its primary is...
We find recurring amino-acid residue packing patterns, or spatial motifs, that are characteristic of protein structural families, by applying a novel frequent subgraph mining algorithm to graph representations three-dimensional structure. Graph nodes represent amino acids, and edges chosen in one three ways: first, using threshold for contact distance between residues; second, Delaunay tessellation; third, the recently developed almost-Delaunay edges. For set graphs representing family from...
This paper examines MPI's ability to support continuous, dynamic load balancing for unbalanced parallel applications. We use an tree search benchmark (UTS) compare two approaches, 1) work sharing using a centralized queue, and 2) stealing explicit polling handle steal requests. Experiments indicate that in addition parameter defining the granularity of balancing, message-passing paradigms require additional parameters such as intervals manage runtime overhead. Using these parameters, we...