Huu Du Nguyen

ORCID: 0000-0001-6067-6676
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Statistical Process Monitoring
  • Advanced Statistical Methods and Models
  • Fault Detection and Control Systems
  • Natural Language Processing Techniques
  • Scientific Measurement and Uncertainty Evaluation
  • Topic Modeling
  • Digital Transformation in Industry
  • Risk and Safety Analysis
  • Industrial Vision Systems and Defect Detection
  • Water Quality Monitoring Technologies
  • Hydrological Forecasting Using AI
  • Mathematical and Theoretical Epidemiology and Ecology Models
  • Nonlinear Differential Equations Analysis
  • Esophageal and GI Pathology
  • Reliability and Maintenance Optimization
  • Stability and Controllability of Differential Equations
  • Evolution and Genetic Dynamics
  • Machine Learning and Data Classification
  • Bayesian Modeling and Causal Inference
  • Water Quality and Pollution Assessment
  • Manufacturing Process and Optimization
  • Statistical Methods and Bayesian Inference
  • Sentiment Analysis and Opinion Mining
  • Bariatric Surgery and Outcomes
  • Advanced Causal Inference Techniques

University of California, San Diego
2024

Hanoi University of Science and Technology
2022-2024

University of California, Irvine
2024

Vietnam National University of Agriculture
2020-2023

Dong A University
2018-2023

École Nationale Supérieure des Arts et Industries Textiles
2020-2023

Laboratoire Génie et Matériaux Textiles
2020-2023

Hanoi Medical University
2023

Université de Lille
2023

Vietnam National University, Hanoi
2005-2022

Teven Le Scao Angela Fan Christopher Akiki Ellie Pavlick Suzana Ilić and 95 more Daniel Hesslow Roman Castagné Alexandra Sasha Luccioni François Yvon Matthias Gallé Jonathan Tow Alexander M. Rush Stella Biderman Albert Webson Pawan Sasanka Ammanamanchi Thomas J. Wang Benoît Sagot Niklas Muennighoff A. Villanova del Moral Olatunji Ruwase Rachel Bawden Stas Bekman Angelina McMillan-Major Iz Beltagy Huu Du Nguyen Lucile Saulnier Samson Tan Pedro Ortiz Suarez Victor Sanh Hugo Laurençon Yacine Jernite Julien Launay Margaret Mitchell Colin Raffel Aaron Gokaslan Adi Simhi Aitor Soroa Alham Fikri Aji Amit Alfassy Anna Rogers Ariel Kreisberg Nitzav Canwen Xu Chenghao Mou Chris Chinenye Emezue Christopher Klamm Colin Leong Daniel van Strien David Ifeoluwa Adelani Dragomir Radev Eduardo González Ponferrada Efrat Levkovizh Ethan Kim Eyal Bar Natan Francesco De Toni Gérard Dupont Germán Kruszewski Giada Pistilli Hady Elsahar Hamza Benyamina Hieu Tran Ian Yu Idris Abdulmumin Isaac Johnson Itziar González-Dios Javier de la Rosa Jenny Chim Jesse Dodge Jianguo Zhu Jonathan Chang Jörg Frohberg Joseph Tobing Joydeep Bhattacharjee Khalid Almubarak Kimbo Chen Kyle Lo Leandro von Werra Leon Weber Long Phan Loubna Ben Allal Ludovic Tanguy Manan Dey Manuel Romero Muñoz Maraim Masoud María Grandury Mario Šaško Max Tze Han Huang Maximin Coavoux Mayank Singh Mike Tian-Jian Jiang Minh Chien Vu Mohammad Ali Jauhar Mustafa Ghaleb Nishant Subramani Nora Kassner Nurulaqilla Khamis Olivier Nguyen Omar Espejel Ona De Gibert Paulo Villegas Peter Henderson

Large language models (LLMs) have been shown to be able perform new tasks based on a few demonstrations or natural instructions. While these capabilities led widespread adoption, most LLMs are developed by resource-rich organizations and frequently kept from the public. As step towards democratizing this powerful technology, we present BLOOM, 176B-parameter open-access model designed built thanks collaboration of hundreds researchers. BLOOM is decoder-only Transformer that was trained ROOTS...

10.48550/arxiv.2211.05100 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Aligning large language models (LLMs) with human preferences has proven to drastically improve usability and driven rapid adoption as demonstrated by ChatGPT. Alignment techniques such supervised fine-tuning (SFT) reinforcement learning from feedback (RLHF) greatly reduce the required skill domain knowledge effectively harness capabilities of LLMs, increasing their accessibility utility across various domains. However, state-of-the-art alignment like RLHF rely on high-quality data, which is...

10.48550/arxiv.2304.07327 preprint EN cc-by arXiv (Cornell University) 2023-01-01

As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings. The BigScience workshop, a 1-year international and multidisciplinary initiative, was formed with goal of researching training large as values-driven undertaking, putting issues ethics, harm, governance foreground. This paper documents data creation curation efforts undertaken by to assemble Responsible Open-science Open-collaboration...

10.48550/arxiv.2303.03915 preprint EN cc-by arXiv (Cornell University) 2023-01-01

The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes progress until December 2022, outlining current state Personally Identifiable Information (PII) redaction pipeline, experiments conducted to de-risk model architecture, and investigating better preprocessing methods training data. We train 1.1B parameter Java, JavaScript, Python subsets Stack evaluate them MultiPL-E text-to-code...

10.48550/arxiv.2301.03988 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

Large language models are increasingly becoming a cornerstone technology in artificial intelligence, the sciences, and society as whole, yet optimal strategies for dataset composition filtering remain largely elusive. Many of top-performing lack transparency their curation model development processes, posing an obstacle to fully open models. In this paper, we identify three core data-related challenges that must be addressed advance open-source These include (1) development, including data...

10.48550/arxiv.2411.12372 preprint EN arXiv (Cornell University) 2024-11-19

The recent emergence and adoption of Machine Learning technology, specifically Large Language Models, has drawn attention to the need for systematic transparent management language data. This work proposes an approach global data governance that attempts organize amongst stakeholders, values, rights. Our proposal is informed by prior on distributed accounts human values grounded international research collaboration brings together researchers practitioners from 60 countries. framework we...

10.1145/3531146.3534637 article EN 2022 ACM Conference on Fairness, Accountability, and Transparency 2022-06-20

Floods are the most frequent natural hazard globally and incidences have been increasing in recent years as a result of human activity global warming, making significant impacts on people’s livelihoods wider socio-economic activities. In terms management environment water resources, precise identification is required areas susceptible to flooding support planners implementing effective prevention strategies. The objective this study develop novel hybrid approach based Bald Eagle Search...

10.3390/w14101617 article EN Water 2022-05-18

In many industrial manufacturing processes, the quality of products can depend on relative amount between two characteristics X and Y. Often, this calls for on-line monitoring ratio Z=X/Y as a characteristic itself by means control chart. A large number charts have been investigated in literature under assumption independent normal observations characteristics. practice, due to high frequency sensor data collection, both autocorrelation cross-correlation consecutive exist Y should be...

10.1080/00207543.2022.2137594 article EN International Journal of Production Research 2022-11-11

Abstract In many industrial manufacturing processes, the ratio of variance to mean a quantity interest is an important characteristic ensure quality processes. This called coefficient variation (CV). A lot control charts have been designed for monitoring CV univariate in literature. However, multivariate not received much attention yet. this paper, we investigate variable sampling interval (VSI) Shewhart chart CV. The time between two consecutive samples allowed vary according previous value...

10.1002/asmb.2472 article EN Applied Stochastic Models in Business and Industry 2019-08-01

Monitoring Land-use/land-cover (LULC) changes are a significant challenge for sustainable spatial planning, particularly in response to transformation and degenerative landscape processes. These disturbances lead the vulnerability of inhabitants habitat climate socio-economic development region. Several studies have proposed different methods techniques monitor temporal LULC. Machine learning is more popular method. However, problem data imbalance challenge, classification results tend bias...

10.46544/ams.v27i2.05 article EN cc-by Acta Montanistica Slovaca 2022-07-28

Abstract Monitoring the ratio between two random normal variables plays an important role in many industrial manufacturing processes. In this paper, we suggest designing one‐sided Shewhart control charts monitoring ratio. The numerical results show that have more advantages compared with two‐sided chart proposed previously literature. Moreover, investigate effect of measurement error on performance these where is supposed to follow a linear covariate model. change model parameters from...

10.1002/qre.2656 article EN Quality and Reliability Engineering International 2020-05-04

When building Large Language Models (LLMs), it is paramount to bear safety in mind and protect them with guardrails. Indeed, LLMs should never generate content promoting or normalizing harmful, illegal, unethical behavior that may contribute harm individuals society. This principle applies both normal adversarial use. In response, we introduce ALERT, a large-scale benchmark assess based on novel fine-grained risk taxonomy. It designed evaluate the of through red teaming methodologies...

10.48550/arxiv.2404.08676 preprint EN arXiv (Cornell University) 2024-04-06

In this paper, we present a method to monitor the coefficient of variation (CV) squared using two one-sided synthetic control charts. The numerical results show that our design outperforms two-sided chart monitoring CV. steady-state, which is have practical meaning in many situations, also considered. We use Markov chain evaluate statistical performance proposed Furthermore, effect measurement errors on charts CV firstly investigated.

10.1109/ieem.2018.8607320 article EN 2021 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM) 2018-12-01

10.1016/j.ress.2018.10.003 article EN Reliability Engineering & System Safety 2018-10-12

We investigate, in this paper, the effect of measurement error (ME) on performance Run Rules control charts monitoring coefficient variation (CV) squared. The previous CV chart literature is improved slightly by squared using two one-sided instead itself a two-sided chart. numerical results show that improvement gives better detecting process shifts. Moreover, we will through simulation precision and accuracy errors do have negative proposed charts. also find out taking multiple measurements...

10.1080/02664763.2020.1787356 article EN Journal of Applied Statistics 2020-07-11

Abstract In the literature, many control charts monitoring median is designed under a perfect condition that there no measurement error. This may make practitioners confusing to apply these because error true problem in practice. this paper, we consider effect of on performance exponentially weighted moving average (EWMA) chart combining with variable sampling interval (VSI) strategy. A linear covariate model supposed The VSI EWMA evaluated through time signal. numerical simulation shows...

10.1002/qre.2726 article EN Quality and Reliability Engineering International 2020-08-13
Coming Soon ...