- Galaxies: Formation, Evolution, Phenomena
- Cosmology and Gravitation Theories
- Gaussian Processes and Bayesian Inference
- Astronomy and Astrophysical Research
- Astrophysics and Star Formation Studies
- Scientific Research and Discoveries
- Stellar, planetary, and galactic studies
- Blind Source Separation Techniques
- Solar and Space Plasma Dynamics
- Bayesian Methods and Mixture Models
- Error Correcting Code Techniques
- Advanced Clustering Algorithms Research
- Image and Signal Denoising Methods
- Computational Physics and Python Applications
- Astronomical Observations and Instrumentation
- Gamma-ray bursts and supernovae
- Traffic Prediction and Management Techniques
- Spectroscopy and Chemometric Analyses
- Data Management and Algorithms
- Statistical Methods and Inference
- Remote Sensing in Agriculture
- Data Visualization and Analytics
- Astro and Planetary Science
- Markov Chains and Monte Carlo Methods
- Atmospheric Ozone and Climate
Flatiron Institute
2022-2024
Flatiron Health (United States)
2022-2024
Mathematics Research Center
2023
Université Paris Cité
2019-2021
École Normale Supérieure - PSL
2019-2021
Sorbonne Université
2019-2021
Laboratoire de Physique de l'ENS
2021
Université Paris Sciences et Lettres
2019-2021
Centre National de la Recherche Scientifique
2019-2021
Laboratoire d’Etudes du Rayonnement et de la Matière en Astrophysique et Atmosphères
2021
Marked power spectra are two-point statistics of a marked field obtained by weighting each location with function that depends on the local density around point. We consider galaxy in redshift space up-weight low regions, and perform Fisher matrix analysis to assess information content this type using Molino mock catalogs built upon Quijote simulations. identify four different ways field, compare contained their one standard spectrum, when considering monopole quadrupole statistic. Our...
We present the cosmological constraints from analyzing higher-order galaxy clustering on small nonlinear scales. use SimBIG, a forward modeling framework for analyses that employs simulation-based inference to perform highly efficient using normalizing flows. It leverages predictive power of high-fidelity simulations and robustly extracts information regimes inaccessible with current standard analyses. In this work, we apply SimBIG subset BOSS sample analyze redshift-space bispectrum...
The non-Gaussian spatial distribution of galaxies traces the large-scale structure Universe and therefore constitutes a prime observable to constrain cosmological parameters. We conduct Bayesian inference <a:math xmlns:a="http://www.w3.org/1998/Math/MathML" display="inline"><a:mi mathvariant="normal">Λ</a:mi><a:mi>CDM</a:mi></a:math> parameters <d:math xmlns:d="http://www.w3.org/1998/Math/MathML" display="inline"><d:msub><d:mi mathvariant="normal">Ω</d:mi><d:mi>m</d:mi></d:msub></d:math>,...
The interstellar medium (ISM) is a complex non-linear system governed by gravity and magneto-hydrodynamics, as well radiative, thermodynamical, chemical processes. Our understanding of it mostly progresses through observations numerical simulations, quantitative comparison between these two approaches requires generic comprehensive statistical description. goal this paper to build such description, with the purpose permit an efficient independent any specific prior or model. We start from...
Abstract Simulation-Based Inference of Galaxies ( SimBIG ) is a forward modeling framework for analyzing galaxy clustering using simulation-based inference. In this work, we present the model, which designed to match observed SDSS-III BOSS CMASS sample. The model based on high-resolution Quijote N -body simulations and flexible halo occupation model. It includes full survey realism models observational systematics such as angular masking fiber collisions. We “mock challenge” validating...
We present cosmological constraints from a simulation-based inference (SBI) analysis of galaxy clustering the S im BIG forward modeling framework. leverages predictive power high-fidelity simulations and provides an framework that can extract information on small nonlinear scales. In this work, we apply to Baryon Oscillation Spectroscopic Survey (BOSS) CMASS sample analyze spectrum, <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" overflow="scroll"> <mml:mrow>...
Extracting the non-Gaussian information of cosmic large-scale structure (LSS) is vital in unlocking full potential rich datasets from upcoming stage-IV galaxy surveys. Galaxy skew spectra serve as efficient beyond-two-point statistics, encapsulating essential bispectrum with computational efficiency akin to power spectrum analysis. This paper presents first cosmological constraints analyzing set redshift-space data SDSS-III BOSS, accessing down nonlinear scales. Employing forward modeling...
ABSTRACT We present AstroCLIP, a single, versatile model that can embed both galaxy images and spectra into shared, physically meaningful latent space. These embeddings then be used – without any fine-tuning for variety of downstream tasks including (1) accurate in-modality cross-modality semantic similarity search, (2) photometric redshift estimation, (3) property estimation from spectra, (4) morphology classification. Our approach to implementing AstroCLIP consists two parts. First, we...
The statistical characterization of the diffuse magnetized ISM and Galactic foregrounds to CMB poses a major challenge. To account for their non-Gaussian statistics, we need data analysis approach capable efficiently quantifying couplings across scales. This information is encoded in data, but most it lost when using conventional tools, such as one-point statistics power spectra. wavelet scattering transform (WST), low-variance descriptor processes introduced science, opens path towards this...
Extracting non-Gaussian information from the non-linear regime of structure formation is key to fully exploiting rich data upcoming cosmological surveys probing large-scale universe. However, due theoretical and computational complexities, this remains one main challenges in analyzing observational data. We present a set summary statistics for matter fields based on 3D wavelets tackle challenge. These are computed as spatial average complex modulus wavelet transform raised power $q$...
ABSTRACT Simulation-based inference (SBI) is a promising approach to leverage high-fidelity cosmological simulations and extract information from the non-Gaussian, non-linear scales that cannot be modelled analytically. However, scaling SBI next generation of surveys faces computational challenge requiring large number accurate over wide range cosmologies, while simultaneously encompassing volumes at high resolution. This can potentially mitigated by balancing accuracy cost for different...
We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic approach for physical surrogate modeling. MPP involves training large models to predict the dynamics of heterogeneous systems simultaneously by learning features that are broadly useful across diverse tasks. In order learn effectively in this setting, we a shared embedding and normalization strategy projects fields into single space. validate efficacy our on both downstream tasks over broad fluid...
We present the first $\Lambda$CDM cosmological analysis performed on a galaxy survey using marked power spectra. The spectrum is two-point function of field, where galaxies are weighted by that depends their local density. presence mark leads these statistics to contain higher-order information original making them good candidate exploit non-Gaussian catalog. In this work we make use \simbig, forward modeling framework for clustering analyses, and perform simulation-based inference...
Abstract Decades of studies have suggested several criteria to detect interplanetary coronal mass ejections (ICME) in time series from situ spacecraft measurements. Among them, the most common are an enhanced and smoothly rotating magnetic field, a low proton temperature, plasma beta. However, these features not all observed for each ICME due their strong variability. Visual detection is time-consuming biased by observer interpretation, leading non-exhaustive, subjective, thus hardly...
Dust emission is the main foreground for cosmic microwave background polarization. Its statistical characterization must be derived from analysis of observational data because precision required a reliable component separation far greater than what currently achievable with physical models turbulent magnetized interstellar medium. This Letter takes significant step toward this goal by proposing method that retrieves non-Gaussian characteristics dust noisy Planck polarization observations at...
Abstract The quest for primordial B -modes in the cosmic microwave background has emphasized need refined models of Galactic dust foreground. Here we aim at building a realistic statistical model multifrequency emission from single example. We introduce generic methodology relying on microcanonical gradient descent conditioned by an extended family wavelet phase harmonic (WPH) statistics. To tackle multichannel aspect data, define cross-WPH statistics, quantifying non-Gaussian correlations...
The low-brightness dust emission at high Galactic latitudes is of interest with respect to studying the interplay among physical processes involved in shaping structure interstellar medium (ISM), as well statistical characterizations a foreground cosmic microwave background (CMB). Progress this avenue research has been hampered by difficulty related separating from infrared (CIB). We demonstrate that and CIB may be effectively separated based on their different sky we use separation...
In recent years, denoising problems have become intertwined with the development of deep generative models. particular, diffusion models are trained like denoisers, and distribution they model coincide priors in Bayesian picture. However, through diffusion-based posterior sampling requires noise level covariance to be known, preventing blind denoising. We overcome this limitation by introducing Gibbs Diffusion (GDiff), a general methodology addressing both signal parameters. Assuming...
With a single training image and using wavelet phase harmonic augmentation, we present polarized Cosmic Microwave Background (CMB) foreground marginalization in high-dimensional likelihood-free (Bayesian) framework. We demonstrate robust removal only frequency of simulated data for BICEP-like sky patch. Using Moment Networks estimate the pixel-level posterior probability underlying {E,B} signal validate statistical model with quantile-type test estimated marginal moments. The use hierarchy...
Simulation-based inference (SBI) is a promising approach to leverage high fidelity cosmological simulations and extract information from the non-Gaussian, non-linear scales that cannot be modeled analytically. However, scaling SBI next generation of surveys faces computational challenge requiring large number accurate over wide range cosmologies, while simultaneously encompassing volumes at resolution. This can potentially mitigated by balancing accuracy cost for different components forward...
Large Language Models have not yet been broadly adapted for the analysis of scientific datasets due in part to unique difficulties tokenizing numbers. We propose xVal, a numerical encoding scheme that represents any real number using just single token. xVal given by scaling dedicated embedding vector value. Combined with modified number-inference approach, this strategy renders model end-to-end continuous when considered as map from numbers input string those output string. This leads an...
The 3D distribution of galaxies encodes detailed cosmological information on the expansion and growth history Universe. We present first constraints that exploit non-Gaussian non-linear scales from galaxy clustering, inaccessible with current standard analyses. analyze a subset BOSS survey using ${\rm S{\scriptsize IM}BIG}$, new framework for inference leverages high-fidelity simulations deep generative models. use two clustering statistics beyond power spectrum: bispectrum convolutional...
Machine learning based surrogate models offer researchers powerful tools for accelerating simulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy new approaches. To address gap, we introduce Well: a large-scale collection containing numerical simulations wide variety spatiotemporal systems. The Well draws from domain experts and software developers provide 15TB data across 16 covering...