NFDI4DS | UHH-SEMS - Publication Details

Huu Du Nguyen

ORCID: 0000-0001-6067-6676

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5044824055

Research Areas

Advanced Statistical Process Monitoring
Advanced Statistical Methods and Models
Fault Detection and Control Systems
Natural Language Processing Techniques
Scientific Measurement and Uncertainty Evaluation
Topic Modeling
Digital Transformation in Industry
Risk and Safety Analysis
Industrial Vision Systems and Defect Detection
Water Quality Monitoring Technologies
Hydrological Forecasting Using AI
Mathematical and Theoretical Epidemiology and Ecology Models
Nonlinear Differential Equations Analysis
Esophageal and GI Pathology
Reliability and Maintenance Optimization
Stability and Controllability of Differential Equations
Evolution and Genetic Dynamics
Machine Learning and Data Classification
Bayesian Modeling and Causal Inference
Water Quality and Pollution Assessment
Manufacturing Process and Optimization
Statistical Methods and Bayesian Inference
Sentiment Analysis and Opinion Mining
Bariatric Surgery and Outcomes
Advanced Causal Inference Techniques

University of California, San Diego
2024

Hanoi University of Science and Technology
2022-2024

University of California, Irvine
2024

Vietnam National University of Agriculture
2020-2023

Dong A University
2018-2023

École Nationale Supérieure des Arts et Industries Textiles
2020-2023

Laboratoire Génie et Matériaux Textiles
2020-2023

Hanoi Medical University
2023

Université de Lille
2023

Vietnam National University, Hanoi
2005-2022

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

OPENALEX - Publications

Teven Le Scao Angela Fan Christopher Akiki Ellie Pavlick Suzana Ilić and 95 more

Large language models (LLMs) have been shown to be able perform new tasks based on a few demonstrations or natural instructions. While these capabilities led widespread adoption, most LLMs are developed by resource-rich organizations and frequently kept from the public. As step towards democratizing this powerful technology, we present BLOOM, 176B-parameter open-access model designed built thanks collaboration of hundreds researchers. BLOOM is decoder-only Transformer that was trained ROOTS...

10.48550/arxiv.2211.05100 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Forecasting and Anomaly Detection approaches using LSTM and LSTM Autoencoder techniques with the applications in supply chain management

OPENALEX - Publications

Huu Du Nguyen Kim Phuc Tran Sébastien Thomassey Murtadha M. Hamad

10.1016/j.ijinfomgt.2020.102282 article EN publisher-specific-oa International Journal of Information Management 2020-12-17

Predicting Water Quality Index (WQI) by feature selection and machine learning: A case study of An Kim Hai irrigation system

OPENALEX - Publications

Bui Quoc Lap Thi-Thu-Hong Phan Huu Du Nguyen Le Xuan Quang Phi Thi Hang and 4 more

10.1016/j.ecoinf.2023.101991 article EN Ecological Informatics 2023-01-18

OpenAssistant Conversations -- Democratizing Large Language Model Alignment

OPENALEX - Publications

Andreas Köpf Yannic Kilcher Dimitri von Rütte Sotiris Anagnostidis Zhi-Rui Tam and 13 more

Aligning large language models (LLMs) with human preferences has proven to drastically improve usability and driven rapid adoption as demonstrated by ChatGPT. Alignment techniques such supervised fine-tuning (SFT) reinforcement learning from feedback (RLHF) greatly reduce the required skill domain knowledge effectively harness capabilities of LLMs, increasing their accessibility utility across various domains. However, state-of-the-art alignment like RLHF rely on high-quality data, which is...

10.48550/arxiv.2304.07327 preprint EN cc-by arXiv (Cornell University) 2023-01-01

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

OPENALEX - Publications

Hugo Laurençon Lucile Saulnier Thomas J. Wang Christopher Akiki A. Villanova del Moral and 49 more

As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings. The BigScience workshop, a 1-year international and multidisciplinary initiative, was formed with goal of researching training large as values-driven undertaking, putting issues ethics, harm, governance foreground. This paper documents data creation curation efforts undertaken by to assemble Responsible Open-science Open-collaboration...

10.48550/arxiv.2303.03915 preprint EN cc-by arXiv (Cornell University) 2023-01-01

SantaCoder: don't reach for the stars!

OPENALEX - Publications

Loubna Ben Allal Raymond Li Denis Kocetkov Chenghao Mou Christopher Akiki and 36 more

The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes progress until December 2022, outlining current state Personally Identifiable Information (PII) redaction pipeline, experiments conducted to de-risk model architecture, and investigating better preprocessing methods training data. We train 1.1B parameter Java, JavaScript, Python subsets Stack evaluate them MultiPL-E text-to-code...

10.48550/arxiv.2301.03988 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

RedPajama: an Open Dataset for Training Large Language Models

OPENALEX - Publications

Maurice Weber Daniel Fu Quentin Anthony Yonatan Oren Sally Adams and 14 more

Large language models are increasingly becoming a cornerstone technology in artificial intelligence, the sciences, and society as whole, yet optimal strategies for dataset composition filtering remain largely elusive. Many of top-performing lack transparency their curation model development processes, posing an obstacle to fully open models. In this paper, we identify three core data-related challenges that must be addressed advance open-source These include (1) development, including data...

10.48550/arxiv.2411.12372 preprint EN arXiv (Cornell University) 2024-11-19

Data Governance in the Age of Large-Scale Data-Driven Language Technology

OPENALEX - Publications

Yacine Jernite Huu Du Nguyen Stella Biderman Anna Rogers Maraim Masoud and 15 more

The recent emergence and adoption of Machine Learning technology, specifically Large Language Models, has drawn attention to the need for systematic transparent management language data. This work proposes an approach global data governance that attempts organize amongst stakeholders, values, rights. Our proposal is informed by prior on distributed accounts human values grounded international research collaboration brings together researchers practitioners from 60 countries. framework we...

10.1145/3531146.3534637 article EN 2022 ACM Conference on Fairness, Accountability, and Transparency 2022-06-20

On the performance of VSI Shewhart control chart for monitoring the coefficient of variation in the presence of measurement errors

OPENALEX - Publications

Huu Du Nguyen Thong Nguyen Kim Phuc Tran Dang Phuc Ho

10.1007/s00170-019-03352-7 article EN The International Journal of Advanced Manufacturing Technology 2019-02-11

Machine Learning and Remote Sensing Application for Extreme Climate Evaluation: Example of Flood Susceptibility in the Hue Province, Central Vietnam Region

OPENALEX - Publications

Minh Cường Hà Phuong Vu Huu Du Nguyen Tich Phuc Hoang Dinh Tung Dang and 4 more

Floods are the most frequent natural hazard globally and incidences have been increasing in recent years as a result of human activity global warming, making significant impacts on people’s livelihoods wider socio-economic activities. In terms management environment water resources, precise identification is required areas susceptible to flooding support planners implementing effective prevention strategies. The objective this study develop novel hybrid approach based Bald Eagle Search...

10.3390/w14101617 article EN Water 2022-05-18

On the effect of the measurement error on Shewhart t and EWMA t control charts

OPENALEX - Publications

Huu Du Nguyen Kim Phuc Tran Giovanni Celano Petros E. Maravelakis Philippe Castagliola

10.1007/s00170-020-05222-z article EN The International Journal of Advanced Manufacturing Technology 2020-04-01

The Shewhart-type RZ control chart for monitoring the ratio of autocorrelated variables

OPENALEX - Publications

Huu Du Nguyen Adel Ahmadi Nadi Kim Duc Tran Philippe Castagliola Giovanni Celano and 1 more

In many industrial manufacturing processes, the quality of products can depend on relative amount between two characteristics X and Y. Often, this calls for on-line monitoring ratio Z=X/Y as a characteristic itself by means control chart. A large number charts have been investigated in literature under assumption independent normal observations characteristics. practice, due to high frequency sensor data collection, both autocorrelation cross-correlation consecutive exist Y should be...

10.1080/00207543.2022.2137594 article EN International Journal of Production Research 2022-11-11

Variable sampling interval Shewhart control charts for monitoring the multivariate coefficient of variation

OPENALEX - Publications

Thong Nguyen Kim Phuc Tran Henri L. Heuchenne Thị Hiền Nguyễn Huu Du Nguyen

Abstract In many industrial manufacturing processes, the ratio of variance to mean a quantity interest is an important characteristic ensure quality processes. This called coefficient variation (CV). A lot control charts have been designed for monitoring CV univariate in literature. However, multivariate not received much attention yet. this paper, we investigate variable sampling interval (VSI) Shewhart chart CV. The time between two consecutive samples allowed vary according previous value...

10.1002/asmb.2472 article EN Applied Stochastic Models in Business and Industry 2019-08-01

On the performance of CUSUM control charts for monitoring the coefficient of variation with measurement errors

OPENALEX - Publications

Kim Phuc Tran Huu Du Nguyen Phuong Hanh Tran Cédric Heuchenne

10.1007/s00170-019-03987-6 article EN The International Journal of Advanced Manufacturing Technology 2019-06-20

Bayesian inference for Common cause failure rate based on causal inference with missing data

OPENALEX - Publications

Huu Du Nguyen Evans Gouno

10.1016/j.ress.2019.106789 article EN Reliability Engineering & System Safety 2020-01-06

The effect of measurement errors on the performance of the Exponentially Weighted Moving Average control charts for the Ratio of Two Normally Distributed Variables

OPENALEX - Publications

Huu Du Nguyen Kim Phuc Tran Khanh-Luan Tran

10.1016/j.ejor.2020.11.042 article EN publisher-specific-oa European Journal of Operational Research 2020-12-04

The composition of time-series images and using the technique SMOTE ENN for balancing datasets in land use/cover mapping

OPENALEX - Publications

Hai Ngo Huu Duy Nguyen Peio Loubière Truong X. Tran Gheorghe Şerban and 4 more

Monitoring Land-use/land-cover (LULC) changes are a significant challenge for sustainable spatial planning, particularly in response to transformation and degenerative landscape processes. These disturbances lead the vulnerability of inhabitants habitat climate socio-economic development region. Several studies have proposed different methods techniques monitor temporal LULC. Machine learning is more popular method. However, problem data imbalance challenge, classification results tend bias...

10.46544/ams.v27i2.05 article EN cc-by Acta Montanistica Slovaca 2022-07-28

Effect of the measurement errors on two one‐sided Shewhart control charts for monitoring the ratio of two normal variables

OPENALEX - Publications

Huu Du Nguyen Kim Phuc Tran

Abstract Monitoring the ratio between two random normal variables plays an important role in many industrial manufacturing processes. In this paper, we suggest designing one‐sided Shewhart control charts monitoring ratio. The numerical results show that have more advantages compared with two‐sided chart proposed previously literature. Moreover, investigate effect of measurement error on performance these where is supposed to follow a linear covariate model. change model parameters from...

10.1002/qre.2656 article EN Quality and Reliability Engineering International 2020-05-04

ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming

OPENALEX - Publications

Simone Tedeschi Felix Friedrich Patrick Schramowski Kristian Kersting Roberto Navigli and 2 more

When building Large Language Models (LLMs), it is paramount to bear safety in mind and protect them with guardrails. Indeed, LLMs should never generate content promoting or normalizing harmful, illegal, unethical behavior that may contribute harm individuals society. This principle applies both normal adversarial use. In response, we introduce ALERT, a large-scale benchmark assess based on novel fine-grained risk taxonomy. It designed evaluate the of through red teaming methodologies...

10.48550/arxiv.2404.08676 preprint EN arXiv (Cornell University) 2024-04-06

One-Sided Synthetic Control Charts for Monitoring the Coefficient of Variation with Measurement Errors

OPENALEX - Publications

Kim Phuc Tran Huu Du Nguyen Thong Nguyen Wichai Chattinnawat

In this paper, we present a method to monitor the coefficient of variation (CV) squared using two one-sided synthetic control charts. The numerical results show that our design outperforms two-sided chart monitoring CV. steady-state, which is have practical meaning in many situations, also considered. We use Markov chain evaluate statistical performance proposed Furthermore, effect measurement errors on charts CV firstly investigated.

10.1109/ieem.2018.8607320 article EN 2021 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM) 2018-12-01

Maximum likelihood and Bayesian inference for common-cause of failure model

OPENALEX - Publications

Huu Du Nguyen Evans Gouno

10.1016/j.ress.2018.10.003 article EN Reliability Engineering & System Safety 2018-10-12

Monitoring coefficient of variation using one-sided run rules control charts in the presence of measurement errors

OPENALEX - Publications

Phuong Hanh Tran Cédric Heuchenne Huu Du Nguyen Hélène Marie

We investigate, in this paper, the effect of measurement error (ME) on performance Run Rules control charts monitoring coefficient variation (CV) squared. The previous CV chart literature is improved slightly by squared using two one-sided instead itself a two-sided chart. numerical results show that improvement gives better detecting process shifts. Moreover, we will through simulation precision and accuracy errors do have negative proposed charts. also find out taking multiple measurements...

10.1080/02664763.2020.1787356 article EN Journal of Applied Statistics 2020-07-11

Design of a variable sampling interval exponentially weighted moving average median control chart in presence of measurement errors

OPENALEX - Publications

Kim Duc Tran Huu Du Nguyen Thị Hiền Nguyễn Kim Phuc Tran

Abstract In the literature, many control charts monitoring median is designed under a perfect condition that there no measurement error. This may make practitioners confusing to apply these because error true problem in practice. this paper, we consider effect of on performance exponentially weighted moving average (EWMA) chart combining with variable sampling interval (VSI) strategy. A linear covariate model supposed The VSI EWMA evaluated through time signal. numerical simulation shows...

10.1002/qre.2726 article EN Quality and Reliability Engineering International 2020-08-13

Coming Soon ...