- Advanced Statistical Process Monitoring
- Advanced Statistical Methods and Models
- Fault Detection and Control Systems
- Natural Language Processing Techniques
- Scientific Measurement and Uncertainty Evaluation
- Topic Modeling
- Digital Transformation in Industry
- Risk and Safety Analysis
- Industrial Vision Systems and Defect Detection
- Water Quality Monitoring Technologies
- Hydrological Forecasting Using AI
- Mathematical and Theoretical Epidemiology and Ecology Models
- Nonlinear Differential Equations Analysis
- Esophageal and GI Pathology
- Reliability and Maintenance Optimization
- Stability and Controllability of Differential Equations
- Evolution and Genetic Dynamics
- Machine Learning and Data Classification
- Bayesian Modeling and Causal Inference
- Water Quality and Pollution Assessment
- Manufacturing Process and Optimization
- Statistical Methods and Bayesian Inference
- Sentiment Analysis and Opinion Mining
- Bariatric Surgery and Outcomes
- Advanced Causal Inference Techniques
University of California, San Diego
2024
Hanoi University of Science and Technology
2022-2024
University of California, Irvine
2024
Vietnam National University of Agriculture
2020-2023
Dong A University
2018-2023
École Nationale Supérieure des Arts et Industries Textiles
2020-2023
Laboratoire Génie et Matériaux Textiles
2020-2023
Hanoi Medical University
2023
Université de Lille
2023
Vietnam National University, Hanoi
2005-2022
Large language models (LLMs) have been shown to be able perform new tasks based on a few demonstrations or natural instructions. While these capabilities led widespread adoption, most LLMs are developed by resource-rich organizations and frequently kept from the public. As step towards democratizing this powerful technology, we present BLOOM, 176B-parameter open-access model designed built thanks collaboration of hundreds researchers. BLOOM is decoder-only Transformer that was trained ROOTS...
Aligning large language models (LLMs) with human preferences has proven to drastically improve usability and driven rapid adoption as demonstrated by ChatGPT. Alignment techniques such supervised fine-tuning (SFT) reinforcement learning from feedback (RLHF) greatly reduce the required skill domain knowledge effectively harness capabilities of LLMs, increasing their accessibility utility across various domains. However, state-of-the-art alignment like RLHF rely on high-quality data, which is...
As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings. The BigScience workshop, a 1-year international and multidisciplinary initiative, was formed with goal of researching training large as values-driven undertaking, putting issues ethics, harm, governance foreground. This paper documents data creation curation efforts undertaken by to assemble Responsible Open-science Open-collaboration...
The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes progress until December 2022, outlining current state Personally Identifiable Information (PII) redaction pipeline, experiments conducted to de-risk model architecture, and investigating better preprocessing methods training data. We train 1.1B parameter Java, JavaScript, Python subsets Stack evaluate them MultiPL-E text-to-code...
Large language models are increasingly becoming a cornerstone technology in artificial intelligence, the sciences, and society as whole, yet optimal strategies for dataset composition filtering remain largely elusive. Many of top-performing lack transparency their curation model development processes, posing an obstacle to fully open models. In this paper, we identify three core data-related challenges that must be addressed advance open-source These include (1) development, including data...
The recent emergence and adoption of Machine Learning technology, specifically Large Language Models, has drawn attention to the need for systematic transparent management language data. This work proposes an approach global data governance that attempts organize amongst stakeholders, values, rights. Our proposal is informed by prior on distributed accounts human values grounded international research collaboration brings together researchers practitioners from 60 countries. framework we...
Floods are the most frequent natural hazard globally and incidences have been increasing in recent years as a result of human activity global warming, making significant impacts on people’s livelihoods wider socio-economic activities. In terms management environment water resources, precise identification is required areas susceptible to flooding support planners implementing effective prevention strategies. The objective this study develop novel hybrid approach based Bald Eagle Search...
In many industrial manufacturing processes, the quality of products can depend on relative amount between two characteristics X and Y. Often, this calls for on-line monitoring ratio Z=X/Y as a characteristic itself by means control chart. A large number charts have been investigated in literature under assumption independent normal observations characteristics. practice, due to high frequency sensor data collection, both autocorrelation cross-correlation consecutive exist Y should be...
Abstract In many industrial manufacturing processes, the ratio of variance to mean a quantity interest is an important characteristic ensure quality processes. This called coefficient variation (CV). A lot control charts have been designed for monitoring CV univariate in literature. However, multivariate not received much attention yet. this paper, we investigate variable sampling interval (VSI) Shewhart chart CV. The time between two consecutive samples allowed vary according previous value...
Monitoring Land-use/land-cover (LULC) changes are a significant challenge for sustainable spatial planning, particularly in response to transformation and degenerative landscape processes. These disturbances lead the vulnerability of inhabitants habitat climate socio-economic development region. Several studies have proposed different methods techniques monitor temporal LULC. Machine learning is more popular method. However, problem data imbalance challenge, classification results tend bias...
Abstract Monitoring the ratio between two random normal variables plays an important role in many industrial manufacturing processes. In this paper, we suggest designing one‐sided Shewhart control charts monitoring ratio. The numerical results show that have more advantages compared with two‐sided chart proposed previously literature. Moreover, investigate effect of measurement error on performance these where is supposed to follow a linear covariate model. change model parameters from...
When building Large Language Models (LLMs), it is paramount to bear safety in mind and protect them with guardrails. Indeed, LLMs should never generate content promoting or normalizing harmful, illegal, unethical behavior that may contribute harm individuals society. This principle applies both normal adversarial use. In response, we introduce ALERT, a large-scale benchmark assess based on novel fine-grained risk taxonomy. It designed evaluate the of through red teaming methodologies...
In this paper, we present a method to monitor the coefficient of variation (CV) squared using two one-sided synthetic control charts. The numerical results show that our design outperforms two-sided chart monitoring CV. steady-state, which is have practical meaning in many situations, also considered. We use Markov chain evaluate statistical performance proposed Furthermore, effect measurement errors on charts CV firstly investigated.
We investigate, in this paper, the effect of measurement error (ME) on performance Run Rules control charts monitoring coefficient variation (CV) squared. The previous CV chart literature is improved slightly by squared using two one-sided instead itself a two-sided chart. numerical results show that improvement gives better detecting process shifts. Moreover, we will through simulation precision and accuracy errors do have negative proposed charts. also find out taking multiple measurements...
Abstract In the literature, many control charts monitoring median is designed under a perfect condition that there no measurement error. This may make practitioners confusing to apply these because error true problem in practice. this paper, we consider effect of on performance exponentially weighted moving average (EWMA) chart combining with variable sampling interval (VSI) strategy. A linear covariate model supposed The VSI EWMA evaluated through time signal. numerical simulation shows...