- Biomedical Text Mining and Ontologies
- Data Management and Algorithms
- Topic Modeling
- Graph Theory and Algorithms
- Sentiment Analysis and Opinion Mining
- Advanced Database Systems and Queries
- Advanced Graph Neural Networks
- Data Stream Mining Techniques
- Natural Language Processing Techniques
- Advanced Text Analysis Techniques
- Data Quality and Management
- Semantic Web and Ontologies
- Bioinformatics and Genomic Networks
- Real-Time Systems Scheduling
- Parallel Computing and Optimization Techniques
- Complexity and Algorithms in Graphs
- Distributed systems and fault tolerance
- Interconnection Networks and Systems
- Cloud Computing and Resource Management
- Web Data Mining and Analysis
- Algorithms and Data Compression
- Misinformation and Its Impacts
- Data Visualization and Analytics
- Machine Learning and Algorithms
- Stochastic Gradient Optimization Techniques
Oak Ridge National Laboratory
2018-2023
Veterans Health Administration
2021
The University of Texas at Arlington
2017-2020
University of Illinois Chicago
2019
Virginia Tech
2011
Our society is struggling with an unprecedented amount of falsehoods, hyperboles, and half-truths. Politicians organizations repeatedly make the same false claims. Fake news floods cyberspace even allegedly influenced 2016 election. In fighting information, number active fact-checking has grown from 44 in 2014 to 114 early 2017. 1 Fact-checkers vet claims by investigating relevant data documents publish their verdicts. For instance, PolitiFact.com, one earliest most popular projects, gives...
Selectivity estimation - the problem of estimating result size queries is a fundamental in databases. Accurate query selectivity involving multiple correlated attributes especially challenging. Poor cardinality estimates could selection bad plans by optimizer. Recently, deep learning has been applied to this with promising results. However, many proposed approaches often struggle provide accurate results for multi attribute large number predicates and low selectivity. In paper, we propose...
Data is generated at an unprecedented rate surpassing our ability to analyze them. The database community has pioneered many novel techniques for Approximate Query Processing (AQP) that could give approximate results in a fraction of time needed computing exact results. In this work, we explore the usage deep learning (DL) answering aggregate queries specifically interactive applications such as data exploration and visualization. We use generative models, unsupervised based approach, learn...
Cancer registries collect unstructured and structured cancer data for surveillance purposes which provide important insights regarding characteristics, treatments, outcomes. registry typically (1) categorize each reportable case or tumor at the time of diagnosis, (2) contain demographic information about patient such as age, gender, location (3) include planned completed primary treatment information, (4) may survival As is being extracted from various sources, pathology reports, radiology...
Selectivity estimation - the problem of estimating result size queries is a fundamental in databases. Accurate query selectivity involving multiple correlated attributes especially challenging. Poor cardinality estimates could selection bad plans by optimizer. We investigate feasibility using deep learning based approaches for both point and range propose two complementary approaches. Our first approach considers as an unsupervised density problem. successfully introduce techniques from...
Data is generated at an unprecedented rate surpassing our ability to analyze them. The database community has pioneered many novel techniques for Approximate Query Processing (AQP) that could give approximate results in a fraction of time needed computing exact results. In this work, we explore the usage deep learning (DL) answering aggregate queries specifically interactive applications such as data exploration and visualization. We use generative models, unsupervised based approach, learn...
Sentiment analysis seeks to characterize opinionated or evaluative aspects of natural language text thus helping people discover valuable information from large amounts unstructured data [1]. In this paper we explore a new methodology for sentiment called proximity-based analysis. We take different approach, by considering set features based on word proximities in written text. propose three features, namely, proximity distribution, mutual between types, and patterns. applied approach the...
Deep learning has surged in popularity and proven to be effective for various artificial intelligence applications including information extraction from cancer pathology reports. Since word representation is a core unit that enables deep algorithms understand words able perform NLP, this must include as much possible help these achieve high classification performance. Therefore, work addition the distributional of large sized corpora, we use UMLS vocabulary resources enrich vector space with...
Population-based central cancer registries collect valuable structured and unstructured data primarily for surveillance reporting. The collected includes (1) categorization of each case (tumor) at the time diagnosis, (2) demographic information about patient such as age, gender, location (3) first course treatment information, (4) survival outcomes when available. While advanced analytical approaches SEER*Stat SAS exist, we provide a knowledge graph approach to organizing registry analytics...
A key component of deep learning (DL) for natural language processing (NLP) is word embeddings. Word embeddings that effectively capture the meaning and context they represent can significantly improve performance downstream DL models various NLP tasks. Many existing techniques words based on co-occurrence in documents text; however, often cannot broader domain-specific relationships between concepts may be crucial task at hand. In this paper, we propose a method to integrate external...
Accurate and synchronized timing information is required by power system operators for controlling the grid infrastructure (relays, Phasor Measurement Units (PMUs), etc.) determining asset positions. Satellite-based global positioning (GPS) primary source of information. However, GPS disruptions today (both intentional unintentional) can significantly compromise reliability security our electric grids. A robust alternate accurate critical to serve both as a deterrent against malicious...
A graph is an excellent way of representing relationships among entities. We can use analytics to synthesize and analyze such relational data, extract relevant features that are useful for various tasks as machine learning. Considering the crucial role in domains, it important timely investigate right hardware configurations achieve optimal performance workloads on future high-performance computing systems. Design space exploration studies facilitate selection appropriate (e.g. memory) a...
Machine learning (ML) has gained a pivotal role in answering complex predictive analytic queries. Model building for large scale datasets is one of the time consuming parts data science pipeline. Often scientists are willing to sacrifice some accuracy order speed up this process during exploratory phase. In paper, we propose demonstrate ApproxML, system that efficiently constructs approximate ML models new queries from previously constructed using concepts model materialization and reuse ....
Many disciplines such as biology, economics, engineering, physics, and the social sciences represent their data graphs to capture patterns, trends, associations. There are many commercially available graph libraries in different programming languages analyze these complex graphs. But there is no distributed library package R - popular statistical language that bigger than a single machine's memory. domain experts prefer over numerous other alternatives. Towards this, we present analytics...
Memory design space exploration methods study memory systems' performances and limitations before implementation. The computer has grown exponentially because of the enormous growth types, controllers, application software. Computer simulators are commonly used for exploration. However, complex simulations take an amount time. Hence, in this paper, we proposed a machine learning-based method dynamic random-access non-volatile systems. We applied our to CosmoGAN LeNet applications predict...
Linear Regression is a seminal technique in statistics and machine learning, where the objective to build linear predictive models between response (i.e., dependent) variable one or more predictor independent) variables. In this paper, we revisit classical of Quantile (QR), which statistically robust alternative other Ordinary Least Square (OLS). However, while there exist efficient algorithms for OLS, almost all known results QR are only weakly polynomial. Towards filling gap, paper...
Electronic health records (EHRs) data from the US Department of Veterans Affairs (VA) show that many veterans have been treated with Selective Serotonin Reuptake Inhibitors (SSRIs) class antidepressant medications. It is crucial to study medications in association other important clinical concepts make better medication prescribing decisions and conduct analyses side effects comorbidity. In this paper, we used PubMed's knowledge graph PageRank algorithm identify related three belong SSRI...
Pediatric Electronic Health Records (EHRs) contain drug/medication data. Despite the importance of standardizing drug data to identify class information and enable interoperability between computer systems, sometimes no biomedical vocabulary is used, therefore not standardized. This paper employed UMLS vocabularies standardize Cincinnati Children's Hospital Medical Center (CCHMC) EHR use it build models for pediatric mental health trajectories. We present an approach that identifies a...
Mining electronic health records (EHRs) to identify contextually related clinical concept clusters that tend co-occur temporarily and consistently could improve data-driven pathway (CP) construction. However, the automatic extraction of contains a vast amount irrelevant information. Hence, this paper proposes knowledge network-enabled literature-based discovery (LBD) approach remove noise from clusters. The authors used published literature filter spurious concepts data US Department...