- Machine Learning in Materials Science
- Computational Drug Discovery Methods
- Advanced Multi-Objective Optimization Algorithms
- Machine Learning and Algorithms
- Machine Learning and Data Classification
- Human Pose and Action Recognition
- Aesthetic Perception and Analysis
- Gaussian Processes and Bayesian Inference
- Gait Recognition and Analysis
- Advanced Bandit Algorithms Research
- Radical Photochemical Reactions
- Advanced Photocatalysis Techniques
- Chemistry and Chemical Engineering
- Photochromic and Fluorescence Chemistry
- Human Motion and Animation
- Conservation Techniques and Studies
- Gamma-ray bursts and supernovae
- Computer Graphics and Visualization Techniques
- Music and Audio Processing
- Mass Spectrometry Techniques and Applications
- Multimodal Machine Learning Applications
- Cultural Heritage Materials Analysis
- Anomaly Detection Techniques and Applications
- Image and Signal Denoising Methods
- Generative Adversarial Networks and Image Synthesis
University of Cambridge
2017-2024
Oxfam
2023
Technical University of Darmstadt
2022
Huawei Technologies (United Kingdom)
2022
University College London
2022
Huawei Technologies (China)
2020
Imperial College London
2020
Automatic Chemical Design is a framework for generating novel molecules with optimized properties.
We investigate the mathematical capabilities of two iterations ChatGPT (released 9-January-2023 and 30-January-2023) GPT-4 by testing them on publicly available datasets, as well hand-crafted ones, using a novel methodology. In contrast to formal mathematics, where large databases proofs are (e.g., Lean Mathematical Library), current datasets natural-language used benchmark language models, either cover only elementary mathematics or very small. address this releasing new datasets: GHOSTS...
ConspectusThe visualization of data is indispensable in scientific research, from the early stages when human insight forms to final step communicating results. In computational physics, chemistry and materials science, it can be as simple making a scatter plot or straightforward looking through snapshots atomic positions manually. However, result "big data" revolution, these conventional approaches are often inadequate. The widespread adoption high-throughput computation for discovery...
In this work we rigorously analyse assumptions inherent to black-box optimisation hyper-parameter tuning tasks. Our results on the Bayesmark benchmark indicate that heteroscedasticity and non-stationarity pose significant challenges for optimisers. Based these findings, propose a Heteroscedastic Evolutionary Bayesian Optimisation solver (HEBO). HEBO performs non-linear input output warping, admits exact marginal log-likelihood is robust values of learned parameters. We demonstrate HEBO’s...
Automatic Chemical Design is a framework for generating novel molecules with optimized properties. The original scheme, featuring Bayesian optimization over the latent space of variational autoencoder, suffers from pathology that it tends to produce invalid molecular structures. First, we demonstrate empirically this arises when scheme queries points far away data on which autoencoder has been trained. Secondly, by reformulating search procedure as constrained problem, show effects can be...
We present a data-driven discovery pipeline for molecular photoswitches through multitask learning with Gaussian processes. Through subsequent screening, we identify several motifs separated and red-shifted electronic absorption bands.
We introduce a method combining variational autoencoders (VAEs) and deep metric learning to perform Bayesian optimisation (BO) over high-dimensional structured input spaces. By adapting ideas from learning, we use label guidance the blackbox function structure VAE latent space, facilitating Gaussian process fit yielding improved BO performance. Importantly for problem settings, our operates in semi-supervised regimes where only few labelled data points are available. run experiments on three...
We introduce GAUCHE, a library for GAUssian processes in CHEmistry. Gaussian have long been cornerstone of probabilistic machine learning, affording particular advantages uncertainty quantification and Bayesian optimisation. Extending to chemical representations, however, is nontrivial, necessitating kernels defined over structured inputs such as graphs, strings bit vectors. By defining we seek open the door powerful tools optimisation chemistry. Motivated by scenarios frequently encountered...
The task of predicting human motion is complicated by the natural heterogeneity and compositionality actions, necessitating robustness to distributional shifts as far out-of-distribution (OoD). Here, we formulate a new OoD benchmark based on Human3.6M Carnegie Mellon University (CMU) capture datasets, introduce hybrid framework for hardening discriminative architectures failure augmenting them with generative model. When applied current state-of-the-art models, show that proposed approach...
Bayesian optimisation is a sample-efficient search methodology that holds great promise for accelerating drug and materials discovery programs. A frequently-overlooked modelling consideration in strategies however, the representation of heteroscedastic aleatoric uncertainty. In many practical applications it desirable to identify inputs with low noise, an example which might be material composition consistently displays robust properties response noisy fabrication process. this paper, we...
Precise celestial positions have been obtained with the HEAO 1 scanning modulation collimators for highly variable X-ray source GX 339--4 (4U 1658--48) and burst MXB 1659--29. Both sources are identified faint (17-18 mag) blue objects He II lambda4686 lambdalambda4640--50 emission.
We present FlowMO: an open-source Python library for molecular property prediction with Gaussian Processes. Built upon GPflow and RDKit, FlowMO enables the user to make predictions well-calibrated uncertainty estimates, output central active learning design applications. Processes are particularly attractive modelling small datasets, a characteristic of many real-world virtual screening campaigns where high-quality experimental data is scarce. Computational experiments across three datasets...
Abstract The optical and UV variability of the majority active galactic nuclei may be related to reprocessing rapidly changing X-ray emission from a more compact region near central black hole. Such model would characterized by lags between optical/UV due differences in light travel time. Observationally, however, such lag features have been difficult detect gaps lightcurves introduced through factors as source visibility or limited telescope In this work, Gaussian process regression is...
Cost-effective Bayesian optimisation screening of 720 additives on four complex reactions, achieving substantial yield improvements over baselines using chemical reaction representations beyond one-hot encoding.
In this work we rigorously analyse assumptions inherent to black-box optimisation hyper-parameter tuning tasks. Our results on the Bayesmark benchmark indicate that heteroscedasticity and non-stationarity pose significant challenges for optimisers. Based these findings, propose a Heteroscedastic Evolutionary Bayesian Optimisation solver (HEBO). HEBO performs non-linear input output warping, admits exact marginal log-likelihood is robust values of learned parameters. We demonstrate HEBO's...
The space of synthesizable molecules is greater than $10^{60}$, meaning only a vanishingly small fraction these have ever been realized in the lab. In order to prioritize which regions this explore next, synthetic chemists need access accurate molecular property predictions. While great advances machine learning made, there dearth benchmarks featuring properties that are useful for chemist. Focussing directly on needs chemist, we introduce Photoswitch Dataset, new benchmark where...
We deploy a prompt-augmented GPT-4 model to distill comprehensive datasets on the global application of debt-for-nature swaps (DNS), pivotal financial tool for environmental conservation. Our analysis includes 195 nations and identifies 21 countries that have not yet used DNS before as prime candidates DNS. A significant proportion demonstrates consistent commitments conservation finance (0.86 accuracy compared historical records). Conversely, 35 previously active in 2010 since been...
Bayesian optimisation presents a sample-efficient methodology for global optimisation. Within this framework, crucial performance-determining subroutine is the maximisation of acquisition function, task complicated by fact that functions tend to be non-convex and thus nontrivial optimise. In paper, we undertake comprehensive empirical study approaches maximise function. Additionally, deriving novel, yet mathematically equivalent, compositional forms popular functions, recast as problem,...
Datasets in the Natural Sciences are often curated with goal of aiding scientific understanding and hence may not always be a form that facilitates application machine learning. In this paper, we identify three trends within fields chemical reaction prediction synthesis design require change direction. First, manner which datasets split into reactants reagents encourages testing models an unrealistically generous manner. Second, highlight prevalence mislabelled data, suggest focus should on...
Reaction additives play a significant role in controlling the reactivity and outcomes of chemical reactions. For example, recent high-throughput additive screening identified phthalimide ligand for Ni-catalysed photoredox decarboxylative arylations. This discovery enabled 4-fold yield improvement by stabilising oxidative addition complexes breaking up deactivated catalyst aggregates. Despite promise such large-scale screenings, they remain inaccessible to most research groups due their cost...
The space of synthesizable molecules is greater than $10^{60}$, meaning only a vanishingly small fraction these have ever been realized in the lab. In order to prioritize which regions this explore next, synthetic chemists need access accurate molecular property predictions. While great advances machine learning made, there dearth benchmarks featuring properties that are useful for chemist. Focussing directly on needs chemist, we introduce Photoswitch Dataset, new benchmark where...
In many areas of the observational and experimental sciences data is scarce. Data observation in high-energy astrophysics disrupted by celestial occlusions limited telescope time while derived from laboratory experiments synthetic chemistry materials science cost-intensive to collect. On other hand, knowledge about data-generation mechanism often available sciences, such as measurement error a piece apparatus. Both characteristics, small underlying physics, make Gaussian processes (GPs)...
We consider the problem of adaptively placing sensors along an interval to detect stochastically-generated events. present a new formulation as continuum-armed bandit with feedback in form partial observations realisations inhomogeneous Poisson process. design solution method by combining Thompson sampling nonparametric inference via increasingly granular Bayesian histograms and derive $\tilde{O}(T^{2/3})$ bound on regret $T$ rounds. This is coupled efficent optimisation approach select...