- Advanced Graph Neural Networks
- Scientific Computing and Data Management
- Semantic Web and Ontologies
- Complex Network Analysis Techniques
- Graph Theory and Algorithms
- Distributed and Parallel Computing Systems
- Smart Grid Security and Resilience
- Topic Modeling
- Research Data Management Practices
- Data Quality and Management
- Information and Cyber Security
- Network Security and Intrusion Detection
- CO2 Sequestration and Geologic Interactions
- Reservoir Engineering and Simulation Methods
- Adversarial Robustness in Machine Learning
- Privacy-Preserving Technologies in Data
- Advanced Database Systems and Queries
- Geological Modeling and Analysis
- Data Mining Algorithms and Applications
- Groundwater flow and contamination studies
- Data Management and Algorithms
- Integrated Energy Systems Optimization
- Advanced Data Storage Technologies
- Infrastructure Resilience and Vulnerability Analysis
- Data Visualization and Analytics
Pacific Northwest National Laboratory
2015-2024
The ability to construct domain specific knowledge graphs (KG) and perform question-answering or hypothesis generation is a transformative capability. Despite their value, automated construction of remains an expensive technical challenge that beyond the reach for most enterprises academic institutions. We propose end-toend framework developing custom graph driven analytics arbitrary application domains. uniqueness our system lies A) in its combination curated KGs along with extracted from...
Graphs are a natural and fundamental representation to describe entities, relationships, activities, evolution of complex systems. Resource Description Framework (RDF) Labeled Property Graph (LPG) two the most used graph-based data models encode information. Both similar in terms using basic graph elements such as nodes edges but differ modeling approach, expressibility, serialization, target applications. RDF is flexible exchange model for expressing information about entities it tends have...
This paper presents GridMW, a scalable and reliable data middleware layer for smart grids. Smart grids promise to improve the efficiency of power grid systems reduce green house emissions through incorporating generation from renewable sources shaping demands match supply. As result, will become much more dynamic require constant adjustments, which requires analysis decision making applications reliability systems. However, these rely on large amounts gathered generation, transmission,...
Velo is a reusable, domain-independent knowledge-management infrastructure for modeling and simulation. leverages, integrates, extends Web-based open source collaborative data-management technologies to create scalable flexible core platform tailored specific scientific domains. As the examples here describe, has been used in both carbon sequestration climate
Link prediction, or predicting the likelihood of a link in knowledge graph based on its existing state is key research task. It differs from traditional prediction task that links are categorized into different predicates and performance generally varies widely. In this work, we propose latent feature embedding model which considers for each predicate disjointly. To learn parameters it utilizes Bayesian personalized ranking optimization technique. Experimental results large-scale bases such...
Network simulation is essential to test adversarial search problems for privacy preservation and benchmarking purposes. Different generative models have been developed single-channel, homogeneous networks that model social networks, communication, co-authorship. Modeling multichannel simultaneously with correlated channel attributes at scale compounds complexity, including signals across channels creates a second set of burdens. We present methodology employ suite generation tools produce...
Cyberattacks on power grids pose significant risks to national security. Power grid attacks typically lead abnormal readings in output, frequency, current, and voltage. Due the interconnected structure of grids, abnormalities can spread throughout system cause widespread outages if not detected dealt with promptly. Our research proposes a novel anomaly detection for that prevents overfitting. We created network graph represent grid, where nodes components like generators edges connections...
The integration of the Internet Things (IoT) into Cyber-Physical Systems (CPSs) has expanded their cyber-attack surface, introducing new and sophisticated threats with potential to exploit emerging vulnerabilities. Assessing risks CPSs is increasingly difficult due incomplete outdated cybersecurity knowledge. This highlights urgent need for better-informed risk assessments mitigation strategies. While previous efforts have relied on rule-based natural language processing (NLP) tools map...
Challenges that make it difficult to find, share, and combine published data, such as data heterogeneity resource discovery, have led increased adoption of semantic standards publishing technologies. To more accessible, interconnected discoverable, some domains are being encouraged publish their Linked Data. Consequently, this trend greatly increases the amount web tools required process, store, interconnect. In attempting process manipulate large sets, -- ranging from simple text editors...
Modern scientific enterprises are inherently knowledge-intensive. In general, studies in domains such as geosciences, climate, and biology require the acquisition manipulation of large amounts experimental field data order to create inputs for large-scale computational simulations. The results these simulations must then be analyzed, leading refinements models additional Further, managed archived provide justifications regulatory decisions publications that based on models. this paper we...
Graph mining is an important data analysis methodology, but struggles as the input graph size increases. The scalability and usability challenges posed by such large graphs make it imperative to sample reduce its size. critical challenge in sampling identify appropriate algorithm insure resulting does not suffer heavily from reduction. Predicting expected performance degradation for a given also useful. In this paper, we present different approaches applications Frequent Subgrpah Mining...
Security assessment of cyber-physical energy systems (CPESs), such as the electric power grid, is a critical operation to maintain availability, reliability, and quality service in presence persistent threats from malicious cyber actors. Existing security approaches, penetration testing red teaming, rely on subject matter experts' experience forensic analysis historical events perform realistic, threat-informed assessments CPES defense. CPESs have large attack surface because heterogeneity...
Cyber-attack surface of an enterprise continuously evolves due to the advent new devices and applications with inherent vulnerabilities, emergence novel attack techniques that exploit these vulnerabilities. Therefore, security management tools must assess cyber-risk at regular intervals by comprehensively identifying associations among techniques, weaknesses, How-ever, existing repositories providing such are incomplete (i.e., missing associations), which increases likelihood undermining...
Science is increasingly motivated by the need to process larger quantities of data. It facing severe challenges in data collection, management, and processing, so much that computational demands "data scaling" are competing with, many fields surpassing, traditional objective decreasing processing time. Example domains with large datasets include astronomy, biology, genomics, climate/weather, material sciences. This paper presents a real-world use case which we wish answer queries provided...
We demonstrate \perco, a distributed system for graph pattern discovery in dynamic graphs. In contrast to conventional mining systems, Percolator advocates efficient schemes that (1) support detection with keywords; (2) integrate incremental and parallel mining; (3) analytical queries such as trend analysis. The core idea of \perco is dynamically decide verify small fraction patterns their instances must be inspected response buffered updates graphs, total cost independent size. a( the...
Graphs are a natural and fundamental representation of describing the activities, relationships, evolution various complex systems. Many domains such as communication, citation, procurement, biology, social media, transportation can be modeled set entities their relationships. Resource Description Framework (RDF) Labeled Property Graph (LPG) two most used data models to encode information in graph. Both similar terms using basic graph elements nodes edges but differ modeling approach,...
Networks are a fundamental and flexible way of representing various complex systems. Many domains such as communication, citation, procurement, biology, social media, transportation can be modeled set entities their relationships. Temporal networks specialization general where every relationship occurs at discrete time. The temporal evolution is important to understand the structure We present Independent Motif (ITeM) characterize graphs from different domains. ITeMs used model graph. In...