Hyung Won Chung

ORCID: 0000-0002-1280-9953
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Membrane Separation Technologies
  • Solar-Powered Water Purification Methods
  • Multimodal Machine Learning Applications
  • Membrane-based Ion Separation Techniques
  • Text Readability and Simplification
  • Artificial Intelligence in Healthcare and Education
  • Speech Recognition and Synthesis
  • Ferroelectric and Negative Capacitance Devices
  • Domain Adaptation and Few-Shot Learning
  • Advanced Neural Network Applications
  • Explainable Artificial Intelligence (XAI)
  • Speech and dialogue systems
  • Machine Learning and Data Classification
  • Software Engineering Research
  • Anomaly Detection Techniques and Applications
  • Surface Modification and Superhydrophobicity
  • Thermodynamic and Exergetic Analyses of Power and Cooling Systems
  • Nanopore and Nanochannel Transport Studies
  • Aerosol Filtration and Electrostatic Precipitation
  • Advanced Computational Techniques and Applications
  • Adversarial Robustness in Machine Learning
  • Cardiovascular Function and Risk Factors
  • Water-Energy-Food Nexus Studies

Google (United States)
2020-2023

Stanford University
2023

Massachusetts Institute of Technology
2015-2019

Large language models have been shown to achieve remarkable performance across a variety of natural tasks using few-shot learning, which drastically reduces the number task-specific training examples needed adapt model particular application. To further our understanding impact scale on we trained 540-billion parameter, densely activated, Transformer model, call Pathways Language Model PaLM. We PaLM 6144 TPU v4 chips Pathways, new ML system enables highly efficient multiple Pods. demonstrate...

10.48550/arxiv.2204.02311 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Abstract Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess knowledge of typically rely on automated evaluations based limited benchmarks. Here, address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries new dataset questions searched online, HealthSearchQA. We propose human...

10.1038/s41586-023-06291-2 article EN cc-by Nature 2023-07-12

Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization unseen tasks. In this paper we explore instruction finetuning with particular focus (1) scaling the number tasks, (2) size, (3) chain-of-thought data. We find that above aspects dramatically improves variety classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation)....

10.48550/arxiv.2210.11416 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc Le, Ed Chi, Denny Zhou, Jason Wei. Findings of the Association for Computational Linguistics: ACL 2023.

10.18653/v1/2023.findings-acl.824 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2023-01-01

Existing pre-trained models are generally geared towards a particular class of problems. To date, there seems to be still no consensus on what the right architecture and pre-training setup should be. This paper presents unified framework for that universally effective across datasets setups. We begin by disentangling architectural archetypes with objectives -- two concepts commonly conflated. Next, we present generalized & perspective self-supervision in NLP show how different can cast as...

10.48550/arxiv.2205.05131 preprint EN other-oa arXiv (Cornell University) 2022-01-01

We study the design decisions of publicly available instruction tuning methods, and break down development Flan 2022 (Chung et al., 2022). Through careful ablation studies on Collection tasks we tease apart effect which enable Flan-T5 to outperform prior work by 3-17%+ across evaluation settings. find task balancing enrichment techniques are overlooked but critical effective tuning, in particular, training with mixed prompt settings (zero-shot, few-shot, chain-of-thought) actually yields...

10.48550/arxiv.2301.13688 preprint EN cc-by arXiv (Cornell University) 2023-01-01

The discovery of adversarial examples has raised concerns about the practical deployment deep learning systems. In this paper, we demonstrate that are capable manipulating systems across three clinical domains. For each our representative medical classifiers, both white and black box attacks were highly successful. Our models current state art in computer vision and, some cases, directly reflect architectures already seeing real world settings. addition to technical contribution synthesize a...

10.48550/arxiv.1804.05296 preprint EN other-oa arXiv (Cornell University) 2018-01-01

State-of-the-art models in natural language processing rely on separate rigid subword tokenization algorithms, which limit their generalization ability and adaptation to new settings. In this paper, we propose a model inductive bias that learns end-to-end as part of the model. To end, introduce soft gradient-based module (GBST) automatically latent representations from characters data-driven fashion. Concretely, GBST enumerates candidate blocks score them position-wise fashion using block...

10.48550/arxiv.2106.12672 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Sharan Narang, Hyung Won Chung, Yi Tay, Liam Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021.

10.18653/v1/2021.emnlp-main.465 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and number parameters in themselves. Scaling can be complicated due to various factors including need distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, ensure reproducible results. In this work, we present two software libraries that ease these issues: $\texttt{t5x}$ simplifies process building large at scale while maintaining...

10.48550/arxiv.2203.17189 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Powering desalination by waste heat is often proposed to mitigate energy consumption and environmental impact; however, thorough technology comparisons are lacking in the literature. This work numerically models efficiency of six representative technologies powered at 50, 70, 90, 120 °C, where applicable. Entropy generation Second Law analysis applied for systems their components. The considered thermal multistage flash (MSF), multiple effect distillation (MED), vacuum membrane (MSVMD),...

10.3390/e17117530 article EN Entropy 2015-10-30

We re-evaluate the standard practice of sharing weights between input and output embeddings in state-of-the-art pre-trained language models. show that decoupled provide increased modeling flexibility, allowing us to significantly improve efficiency parameter allocation embedding multilingual By reallocating parameters Transformer layers, we achieve dramatically better performance on natural understanding tasks with same number during fine-tuning. also allocating additional capacity provides...

10.48550/arxiv.2010.12821 preprint EN other-oa arXiv (Cornell University) 2020-01-01

There remain many open questions pertaining to the scaling behaviour of Transformer architectures. These decisions and findings can be critical, as training runs often come with an associated computational cost which have both financial and/or environmental impact. The goal this paper is present insights from pretraining finetuning Transformers. While Kaplan et al. presents a comprehensive study language models, scope only on upstream (pretraining) loss. Therefore, it still unclear if these...

10.48550/arxiv.2109.10686 preprint EN other-oa arXiv (Cornell University) 2021-01-01

In many applications of machine learning, certain categories examples may be underrepresented in the training data, causing systems to underperform on such "few-shot" cases at test time. A common remedy is perform data augmentation, as by duplicating examples, or heuristically synthesizing new examples. But these remedies often fail cover full diversity and complexity real We propose a augmentation approach that performs neural Example Extrapolation (Ex2). Given handful exemplars sampled...

10.48550/arxiv.2102.01335 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Recent developments in machine translation and multilingual text generation have led researchers to adopt trained metrics such as COMET or BLEURT, which treat evaluation a regression problem use representations from pre-trained models XLM-RoBERTa mBERT. Yet studies on related tasks suggest that these are most efficient when they large, is costly impractical for evaluation. We investigate the trade-off between multilinguality model capacity with RemBERT, state-of-the-art language model, using...

10.18653/v1/2021.emnlp-main.58 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

We evaluate the reasoning abilities of large language models in multilingual settings. introduce Multilingual Grade School Math (MGSM) benchmark, by manually translating 250 grade-school math problems from GSM8K dataset (Cobbe et al., 2021) into ten typologically diverse languages. find that ability to solve MGSM via chain-of-thought prompting emerges with increasing model scale, and have strikingly strong abilities, even underrepresented languages such as Bengali Swahili. Finally, we show...

10.48550/arxiv.2210.03057 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Large language models (LLMs) have demonstrated impressive capabilities in natural understanding and generation, but the quality bar for medical clinical applications is high. Today, attempts to assess models' knowledge typically rely on automated evaluations limited benchmarks. There no standard evaluate model predictions reasoning across a breadth of tasks. To address this, we present MultiMedQA, benchmark combining six existing open question answering datasets spanning professional exams,...

10.48550/arxiv.2212.13138 preprint EN cc-by arXiv (Cornell University) 2022-01-01
Coming Soon ...