- Gaussian Processes and Bayesian Inference
- Complex Network Analysis Techniques
- Graph Theory and Algorithms
- Advanced Graph Neural Networks
- Machine Learning and Data Classification
- Caching and Content Delivery
- Gamma-ray bursts and supernovae
- Advanced Multi-Objective Optimization Algorithms
- Complexity and Algorithms in Graphs
- Computational Physics and Python Applications
- Reinforcement Learning in Robotics
- Advanced Vision and Imaging
- Data Mining Algorithms and Applications
- Robotics and Sensor-Based Localization
- Network Security and Intrusion Detection
- Information and Cyber Security
- Air Quality Monitoring and Forecasting
- Algorithms and Data Compression
- Opinion Dynamics and Social Influence
- Astronomy and Astrophysical Research
- Parallel Computing and Optimization Techniques
- Data Management and Algorithms
- American Jewish Fiction Analysis
- Doctoral Education Challenges and Solutions
- Neural Networks and Reservoir Computing
Lawrence Livermore National Laboratory
2019-2024
Queen Mary University of London
2021
Dartmouth College
2014-2019
Birkbeck, University of London
2016
Finding a minimum spanning tree (MST) for $n$ points in an arbitrary metric space is fundamental primitive hierarchical clustering and many other ML tasks, but this takes $\Omega(n^2)$ time to even approximate. We introduce framework MSTs that first (1) finds forest of disconnected components using practical heuristics, then (2) small weight set edges connect disjoint the into tree. prove optimally solving second step still time, we provide subquadratic 2.62-approximation algorithm. In...
Abstract Analysis of cosmic shear is an integral part understanding structure growth across time, which in turn provides us with information about the nature dark energy. Conventional methods generate maps from we can infer matter distribution universe. Current (e.g., Kaiser–Squires inversion) for generating these maps, however, are tricky to implement and introduce bias. Recent alternatives construct a spatial process prior lensing potential, allows inference convergence parameters given...
Purpose This paper aims to examine the effectiveness of PhD support groups as an intervention that improves mental well-being and increases confidence in timely completion. Design/methodology/approach Participants six groups, which we co-facilitated, completed a survey at start end eight weeks attendance. The measured subjective completion using Warwick-Edinburgh Mental Well-being Scale statements from Postgraduate Research Experience Survey (2017 2019). final also included open-ended...
Abstract We introduce a novel method for discerning optical telescope images of stars from those galaxies using Gaussian processes (GPs). Although applications GPs often struggle in high-dimensional data modalities such as image classification, we show that low-dimensional embedding into metric space defined by the principal components suffices to produce high-quality predictions real large-scale survey data. develop GP classification hyperparameter training scales approximately linearly...
Abstract A significant fraction of observed galaxies in the Rubin Observatory Legacy Survey Space and Time (LSST) will overlap at least one other galaxy along same line sight, a so-called “blend.” The current standard method assessing blend likelihood LSST images relies on counting up number intensity peaks smoothed image candidate, but reliability this procedure has not yet been comprehensively studied. Here we construct realistic distribution blended unblended through high-fidelity...
The Message Passing Interface (MPI) is the de facto standard for message handling in distributed computing. MPI collective communication schemes where many processors communicate with one another depend upon synchronous handshake agreements. This results applications depending iterative communications moving at speed of their slowest processors. We describe a methodology bootstrapping asynchronous primitives to MPI, an emphasis on irregular and imbalanced all-to-all patterns found data...
We update our prior 2017 Graph Challenge submission [7] on large scale triangle counting in distributed memory by demonstrating scaling and validation trillion-edge scale-free graphs. incorporate recent communication optimizations developed for irregular workloads [1], demonstrate up to 1.5 million cores of IBM BG/Q Sequoia at LLNL. validate implementation using nonstochastic Kronecker graph generation where ground-truth local global counts are known, model inputs after the Graph500 [5]...
Understanding the higher-order interactions within network data is a key objective of science. Surveys metadata triangles (or patterned 3-cycles in metadata-enriched graphs) are often interest this pursuit. In work, we develop TriPoll, prototype distributed HPC system capable surveying massive graphs containing on their edges and vertices. We contrast our approach with much prior effort triangle analysis, which focuses simple counting, usually no metadata. assess scalability TriPoll when...
Quantum computers have the opportunity to be transformative for a variety of computational tasks. Recently, there been proposals use unsimulatably large quantum devices perform regression, classification, and other machine learning tasks with advantage by using kernel methods. While is necessary condition in learning, it not sufficient, as all kernels are equally effective. Here, we study one- multi-dimensional well reinforcement Gaussian Processes. By approximations performant classical...
While deep neural networks (DNNs) and Gaussian Processes (GPs) are both popularly utilized to solve problems in reinforcement learning, approaches feature undesirable drawbacks for challenging problems. DNNs learn complex non-linear embeddings, but do not naturally quantify uncertainty often data-inefficient train. GPs infer posterior distributions over functions, popular kernels exhibit limited expressivity on high-dimensional data. Fortunately, recently discovered conjugate tangent kernel...
We describe a methodology for estimating edge-local triangle counts using cardinality approximation sketches. While the approach does not guarantee relative error bounds, we will show that it preserves count heavy hitters - edges incident upon largest number of triangles well in practice. Furthermore, provide empirical evidence sum estimations yield reasonable estimates global free. In this paper two-pass algorithm hitters. The requires time linear edges, memory almost vertices, and is easy...
Gaussian processes (GPs) are non-linear probabilistic models popular in many applications. However, naïve GP realizations require quadratic memory to store the covariance matrix and cubic computation perform inference or evaluate likelihood function. These bottlenecks have driven much investment development of approximate alternatives that scale large data sizes common modern data-driven We present this manuscript MuyGPs, a novel efficient hyperparameter estimation method. MuyGPs builds upon...
Computing various global and local topological graph features is an important facet of data analysis. To do so robustly scalably requires efficient algorithms that either calculate exactly or approximate accurately. For this reason researchers developing distributed analytic desire generated benchmarks share the challenging characteristics real-world graphs (small-world, scale-free, heavy-tailed degree distribution) with efficiently calculated ground truth to desired ouput. Given two small...
This paper is an attempt to explore J. D. Salinger’s The Catcher in the Rye (1951/1958) relation Winnicott’s theories of adolescent development, also with regard psychodynamic symbolism, mourning, defence mechanisms and containment. I consider significance novel’s protagonist narrator Holden Caulfield. What reason for enduring popularity his voice its influence on tone subsequent literature? To answer this question, examine role iconic, troubled character may play development reader.
Stellar blends, where two or more stars appear blended in an image, pose a significant visualization challenge astronomy. Traditionally, distinguishing these blends from single has been costly and resource-intensive, involving sophisticated equipment extensive expert analysis. This is especially problematic for analyzing the vast data volumes surveys, such as Legacy Survey of Space Time (LSST), Sloan Digital Sky (SDSS), Dark Energy Spectroscopic Instrument (DESI), Imaging Zwicky Transient...
Kernels representing limiting cases of neural network architectures have recently gained popularity. However, the application and performance these new kernels compared to existing options, such as Matern kernel, is not well studied. We take a practical approach explore Gaussian process (NNGP) kernel its data in regression. first demonstrate necessity normalization produce valid NNGP related numerical challenges. further that predictions from this model are quite inflexible, therefore do...
Gaussian process (GP) models are effective non-linear for numerous scientific applications. However, computation of their hyperparameters can be difficult when there is a large number training observations (n) due to the O(n^3) cost evaluating likelihood function. Furthermore, non-identifiable hyperparameter values induce difficulty in parameter estimation. Because this, maximum estimation or Bayesian calibration sometimes omitted and estimated with prediction-based methods such as grid...
Analyzing electrocardiography (ECG) data is essential for diagnosing and monitoring various heart diseases. The clinical adoption of automated methods requires accurate confidence measurements, which are largely absent from existing classification methods. In this paper, we present a robust Gaussian Process hyperparameter training model (MuyGPs) discerning normal heartbeat signals the affected by different arrhythmias myocardial infarction. We compare performance MuyGPs with traditional...
Gaussian Process (GP) regression is a flexible modeling technique used to predict outputs and capture uncertainty in the predictions. However, GP process becomes computationally intensive when training spatial dataset has large number of observations. To address this challenge, we introduce scalable algorithm, termed MuyGPs, which incorporates nearest neighbor leave-one-out cross-validation during training. This approach enables evaluation datasets with state-of-the-art accuracy speed...
Abstract. Gaussian process (GP) regression is a flexible modeling technique used to predict outputs and capture uncertainty in the predictions. However, GP becomes computationally intensive when training spatial dataset has large number of observations. To address this challenge, we introduce scalable algorithm, termed MuyGPs, which incorporates nearest-neighbor leave-one-out cross-validation during training. This approach enables evaluation datasets with state-of-the-art accuracy speed...
YGM is a general-purpose asynchronous distributed computing library for C++/MPI, designed to handle the irregular data access patterns and small messages of graph algorithms science applications. It uses serialization give an easily usable active message interface aggregation maximize application throughput. Our design philosophy makes tradeoff that increases network bandwidth utilization at cost added latency. We provide suite benchmarks showcasing YGM's performance. Compared similar...