You-Wei Cheah
- Scientific Computing and Data Management
- Distributed and Parallel Computing Systems
- Research Data Management Practices
- Data Quality and Management
- Atmospheric and Environmental Gas Dynamics
- Advanced Data Storage Technologies
- Peatlands and Wetlands Ecology
- Geochemistry and Geologic Mapping
- Plant Water Relations and Carbon Dynamics
- Environmental Monitoring and Data Management
- Climate variability and models
- Advanced Database Systems and Queries
- Advanced Algorithms and Applications
- Energy Load and Power Forecasting
- Gas Dynamics and Kinetic Theory
- Cloud Computing and Resource Management
- Carbon Dioxide Capture Technologies
- Big Data and Business Intelligence
- Fault Detection and Control Systems
- Semantic Web and Ontologies
- Remote Sensing in Agriculture
- Embedded Systems Design Techniques
- Topic Modeling
- Species Distribution and Climate Change
- Sensor Technology and Measurement Systems
Lawrence Berkeley National Laboratory
2016-2024
Indiana University
2011-2014
Indiana University Bloomington
2010-2014
Abstract The FLUXNET2015 dataset provides ecosystem-scale data on CO 2 , water, and energy exchange between the biosphere atmosphere, other meteorological biological measurements, from 212 sites around globe (over 1500 site-years, up to including year 2014). These sites, independently managed operated, voluntarily contributed their create global datasets. Data were quality controlled processed using uniform methods, improve consistency intercomparability across sites. is already being used...
A Correction to this paper has been published: https://doi.org/10.1038/s41597-021-00851-9.
AmeriFlux is a network of research sites that measure carbon, water, and energy fluxes between ecosystems the atmosphere using eddy covariance technique to study variety Earth science questions. AmeriFlux's diversity ecosystems, instruments, data-processing routines create challenges for data standardization, quality assurance, sharing across network. To address these challenges, Management Project (AMP) designed implemented BASE pipeline. The pipeline begins with uploaded by site teams,...
Visualization facilitates the understanding of scientific data both through exploration and explanation visualized data. Provenance also contributes to by containing contributing factors behind a result. The visualization provenance, although supported in existing workflow management systems, generally focuses on small (medium) sized provenance data, lacking techniques deal with big high complexity. This paper discusses developed for including layout algorithm, visual style, graph...
Abstract. Methane (CH4) emissions from natural landscapes constitute roughly half of global CH4 contributions to the atmosphere, yet large uncertainties remain in absolute magnitude and seasonality emission quantities drivers. Eddy covariance (EC) measurements flux are ideal for constraining ecosystem-scale emissions, including their seasonality, due quasi-continuous high temporal resolution measurements, coincident carbon, water, energy fluxes, lack ecosystem disturbance, increased...
The volume and complexity of data produced analyzed in scientific collaborations is growing exponentially. It important to track data-intensive analysis workflows provide context reproducibility as transformed these collaborations. Provenance addresses this need aids scientists by providing the lineage or history how generated, used modified. has traditionally been collected at workflow level often making it hard capture relevant information about resource characteristics difficult for users...
It can be natural to believe that many of the traditional issues scale have been eliminated or at least greatly reduced via cloud computing. That is, if one create a seemingly well functioning application operates correctly on small moderate-sized problems, then very nature programming abstractions means same will run as potentially significantly larger problems. In this paper, we present our experiences taking MODISAzure, satellite data processing system built Windows Azure computing...
We live in an era which scientific discovery is increasingly driven by data exploration of massive datasets. Scientists today are envisioning diverse analyses and computations that scale from the desktop to supercomputers, yet often have difficulty designing constructing software architectures accommodate heterogeneous inconsistent at scale. Moreover, computational resource needs can vary widely over time. The grow as science collaboration broadens or additional accumulated; demand large...
Data provenance, a key piece of metadata that describes the lifecycle data product, is crucial in aiding scientists to better understand and facilitate reproducibility reuse scientific results. Provenance collection systems often capture provenance on fly protocol between application tool may not be reliable. As result, can become ambiguous or simply inaccurate. In this paper, we identify likely quality issues provenance. We also establish dimensions are especially critical for evaluation...
Data provenance, a form of metadata describing the life cycle data product, is crucial in sharing research data. Research data, when shared over decades, requires recipients to make determination both use and trust. That is, can they data? More importantly, trust it? Knowing are high quality one factor establishing fitness for Provenance be used assert but provenance must known as well. We propose framework assessing provenance. identify issues establish key dimensions, define analysis....
Recent emphasis and requirements for open data publication have led to significant increases in availability the Earth sciences, which is critical long-tail integration. Currently, are often published a repository with an identifier citation, similar those papers. Subsequent publications that use expected provide citation reference section of paper. However, format still evolving, particularly regards citing dynamic data, subsets, collections data. Considering motivations both producers...
Data quality control is one of the most time consuming activities within Research Infrastructures (RIs), especially when involving observational data and multiple providers. In this work we report on our ongoing development rogues, a scalable approach to manage issues for RIs. The motivation started with creation FLUXNET2015 dataset, which includes carbon, water, energy fluxes plus micrometeorological ancillary measured in over 200 sites around world. To create an uniform including derived...
The Carbon Capture Simulation Initiative (CCSI) project has developed and deployed scientific infrastructure called the CCSI Toolset. Toolset provides state-of-the-art computational modeling simulation tools to accelerate commercialization of carbon capture technologies from discovery development, demonstration, ultimately widespread deployment hundreds power plants. have potential dramatically reduce emissions end users in industry with a comprehensive, integrated suite leading-edge,...
Widely used in studies ranging from ecophysiology dynamics to global estimates using models and remote sensing data, FLUXNET datasets have become key scientific research applications. More frequently updated high-quality data collections are ever more pressing, serving opportunities with new technologies real-world applications including nature-based technological climate solutions, carbon credit verification, support agriculture decision systems, ecological forecasting. The three major...
Qualitative user research is a human-intensive approach that draws upon ethnographic methods from social sciences to develop insights about work practices inform software design and development. Recent advances in data science, particular, natural language processing (NLP), enables the derivation of machine-generated augment existing techniques. Our describes our prototype framework based Jupyter, tool supports interactive science scientific computing, leverages NLP techniques make sense...