Nguyễn Bách

ORCID: 0000-0002-9245-3298
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Multimodal Machine Learning Applications
  • Speech and dialogue systems
  • Text Readability and Simplification
  • Neural Networks and Applications
  • Text and Document Classification Technologies
  • Software Engineering Research
  • Algorithms and Data Compression
  • Machine Learning and Data Classification
  • Speech Recognition and Synthesis
  • Artificial Intelligence in Healthcare and Education
  • Explainable Artificial Intelligence (XAI)
  • Web Data Mining and Analysis
  • Music and Audio Processing
  • Blind Source Separation Techniques
  • Semantic Web and Ontologies
  • Handwritten Text Recognition Techniques
  • Machine Learning in Bioinformatics
  • Vehicle License Plate Recognition
  • Computational Physics and Python Applications
  • COVID-19 diagnosis using AI
  • Video Analysis and Summarization
  • Smart Agriculture and AI
  • French Language Learning Methods

University Medical Center HCMC
2024-2025

Can Tho University
2025

Can Tho University of Medicine and Pharmacy
2025

Hanoi University
2024

Vietnam National University Ho Chi Minh City
2023

VinUniversity
2023

Alibaba Group (Cayman Islands)
2020-2022

Microsoft Research (United Kingdom)
2022

Alibaba Group (United States)
2019-2021

Carnegie Mellon University
2007-2012

Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Zhongqiang Huang, Fei Kewei Tu. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.206 article EN cc-by 2021-01-01

Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Zhongqiang Huang, Fei Kewei Tu. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.142 article EN cc-by 2021-01-01

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such Mixtral 8x7B GPT-3.5 (e.g., phi-3-mini achieves 69% MMLU 8.38 MT-bench), despite being small enough to be deployed phone. The innovation lies entirely in our dataset for training, scaled-up version the one used phi-2, composed heavily filtered web data synthetic data. is also further...

10.48550/arxiv.2404.14219 preprint EN arXiv (Cornell University) 2024-04-22

Ebrahim Ansari, Amittai Axelrod, Nguyen Bach, Ondřej Bojar, Roldano Cattoni, Fahim Dalvi, Nadir Durrani, Marcello Federico, Christian Federmann, Jiatao Gu, Fei Huang, Kevin Knight, Xutai Ma, Ajay Nagesh, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Xing Shi, Sebastian Stüker, Marco Turchi, Alexander Waibel, Changhan Wang. Proceedings of the 17th International Conference on Spoken Language Translation. 2020.

10.18653/v1/2020.iwslt-1.1 article EN cc-by 2020-01-01

Unsupervised neural machine translation (UNMT) has recently achieved remarkable results \cite{lample2018phrase} with only large monolingual corpora in each language. However, the uncertainty of associating target source sentences makes UNMT theoretically an ill-posed problem. This work investigates possibility utilizing images for disambiguation to improve performance UNMT. Our assumption is intuitively based on invariant property image, i.e., description same visual content by different...

10.1109/cvpr.2019.01073 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Xinyu Wang, Min Gui, Yong Jiang, Zixia Jia, Nguyen Bach, Tao Zhongqiang Huang, Kewei Tu. Proceedings of the 2022 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2022.

10.18653/v1/2022.naacl-main.232 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2022-01-01

Common pediatric distal forearm fractures necessitate precise detection. To support prompt treatment planning by clinicians, our study aimed to create a multi-class convolutional neural network (CNN) model for fractures, guided the AO Foundation/Orthopaedic Trauma Association (AO/ATO) classification system fractures. The GRAZPEDWRI-DX dataset (2008–2018) of wrist X-ray images was used. We labeled into four fracture classes (FRM, FUM, FRE, and FUE with F, fracture; R, radius; U, ulna; M,...

10.1007/s10278-024-00968-4 article EN Deleted Journal 2024-02-02

10.21437/interspeech.2019-1336 article EN Interspeech 2022 2019-09-13

Multilingual sequence labeling is a task of predicting label sequences using single unified model for multiple languages. Compared with relying on monolingual models, multilingual has the benefit smaller size, easier in online serving, and generalizability to low-resource However, current models still underperform individual significantly due capacity limitations. In this paper, we propose reduce gap between by distilling structural knowledge several (teachers) (student). We two novel KD...

10.18653/v1/2020.acl-main.304 preprint EN cc-by 2020-01-01

Massive data collected on public roads for autonomous driving has become more popular in many locations the world. More leads to concerns about privacy, including but not limited pedestrian faces and surrounding vehicle license plates, which urges robust solutions detecting anonymizing them realistic road-driving scenarios. Existing datasets both face plate detection are either focused or only parking lots. In this paper, we introduce a challenging dataset domain. The is aggregated from...

10.1109/wacv56688.2023.00126 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023-01-01

People say, "A picture is worth a thousand words". Then how can we get the rich information out of image? We argue that by using visual clues to bridge large pretrained vision foundation models and language models, do so without any extra cross-modal training. Thanks strong zero-shot capability start constructing semantic representation image (e.g., tags, object attributes / locations, captions) as structured textual prompt, called clues, model. Based on use model produce series...

10.48550/arxiv.2206.01843 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Cohesive constraints allow the phrase-based decoder to employ arbitrary, non-syntactic phrases, and encourage it translate those phrases in an order that respects source dependency tree structure. We present extensions of cohesive constraints, such as exhaustive interruption count rich check. show cohesion-enhanced significantly outperforms standard on English→Spanish. Improvements between 0.5 1.2 BLEU point are obtained English→Iraqi system.

10.3115/1620853.1620855 article EN 2009-01-01

In building practical two-way speech-to-speech translation systems the end user will always wish to use system in an environment different from original training data. As with all speech systems, it is important allow adapt actual usage situations. This paper investigates how a can day-to-day collected data on day one improve performance two. The platform CMU Iraqi-English portable as developed under DARPA TransTac program. We show machine translation, recognition and overall be improved 2...

10.3115/1620853.1620895 article EN 2009-01-01

Entity retrieval, which aims at disambiguating mentions to canonical entities from massive KBs, is essential for many tasks in natural language processing. Recent progress entity retrieval shows that the dual-encoder structure a powerful and efficient framework nominate candidates if are only identified by descriptions. However, they ignore property meanings of diverge different contexts related various portions descriptions, treated equally previous works. In this work, we propose...

10.18653/v1/2021.emnlp-main.205 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

Xinyu Wang, Yong Jiang, Zhaohui Yan, Zixia Jia, Nguyen Bach, Tao Zhongqiang Huang, Fei Kewei Tu. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.46 article EN cc-by 2021-01-01

Zechuan Hu, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Kewei Tu. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.207 article EN cc-by 2021-01-01
Coming Soon ...