Jaemin Cho

ORCID: 0000-0003-1148-5413
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Topic Modeling
  • Domain Adaptation and Few-Shot Learning
  • Advanced Image and Video Retrieval Techniques
  • Natural Language Processing Techniques
  • Video Analysis and Summarization
  • Perovskite Materials and Applications
  • Fuel Cells and Related Materials
  • Robotics and Sensor-Based Localization
  • Computer Graphics and Visualization Techniques
  • Chalcogenide Semiconductor Thin Films
  • Human Motion and Animation
  • Advancements in Solid Oxide Fuel Cells
  • Speech and dialogue systems
  • Corporate Finance and Governance
  • Private Equity and Venture Capital
  • Recycling and Waste Management Techniques
  • Conducting polymers and applications
  • Integrated Energy Systems Optimization
  • Advanced Neural Network Applications
  • Reinforcement Learning in Robotics
  • Embedded Systems Design Techniques
  • Advancements in Photolithography Techniques
  • Visual Attention and Saliency Detection
  • Quantum Dots Synthesis And Properties

Korea Advanced Institute of Science and Technology
2021-2025

University of North Carolina at Chapel Hill
2020-2024

University of North Carolina Health Care
2020-2024

Korea University of Technology and Education
2024

Inha University
2022-2024

Korea Institute of Ocean Science and Technology
2024

Seoul National University
2016-2023

Pohang University of Science and Technology
2012-2020

University of Washington
2020

Naver (South Korea)
2019

Recently, fine-tuning language models pre-trained on large text corpora have provided huge improvements vision-and-language (V&L) tasks as well pure tasks. However, the entire parameter set of becomes impractical since model size is growing rapidly. Hence, in this paper, we introduce adapter-based parameter-efficient transfer learning techniques to V&L such VL-BART and VL-T5. We evaluate our methods a unified multi-task setup both image-text video-text benchmarks. For tasks, use four diverse...

10.1109/cvpr52688.2022.00516 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Recently, DALL-E [45], a multimodal transformer language model, and its variants including diffusion models have shown high-quality text-to-image generation capabilities. However, despite the realistic image results, there has not been detailed analysis of how to evaluate such models. In this work, we investigate visual reasoning capabilities social biases different models, covering both First, measure three skills: object recognition, counting, spatial relation understanding. For this,...

10.1109/iccv51070.2023.00283 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Yookoon Park, Jaemin Cho, Gunhee Kim. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.

10.18653/v1/n18-1162 article EN cc-by 2018-01-01

Mirroring the success of masked language models, vision-and-language counterparts like VILBERT, LXMERT and UNITER have achieved state art performance on a variety multimodal discriminative tasks visual question answering grounding. Recent work has also successfully adapted such models towards generative task image captioning. This begs question: Can these go other way generate images from pieces text? Our analysis popular representative this model family – finds that it is unable to rich...

10.18653/v1/2020.emnlp-main.707 article EN cc-by 2020-01-01

Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of domains recently. However, it is costly to update the entire parameter set models. Although recently proposed parameter-efficient transfer learning (PETL) techniques allow updating small subset parameters (e.g. only using 2% parameters) inside backbone network for new task, they reduce training memory requirement by up 30%. This because gradient computation trainable still requires backpropagation...

10.48550/arxiv.2206.06522 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Jaemin Cho, Minjoon Seo, Hannaneh Hajishirzi. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.

10.18653/v1/d19-1308 article EN cc-by 2019-01-01

Modern image captioning models are usually trained with text similarity objectives. However, since reference captions in public datasets often describe the most salient common objects, objectives tend to ignore specific and detailed aspects of an that distinguish it from others. Towards more descriptive distinctive caption generation, we propose use CLIP, a multimodal encoder on huge image-text pairs web, calculate multi-modal as reward function. We also simple finetuning strategy CLIP...

10.18653/v1/2022.findings-naacl.39 article EN cc-by Findings of the Association for Computational Linguistics: NAACL 2022 2022-01-01

Abstract In order to realize both efficient and stable perovskite solar cells, designing electron transport layer (ETL) is of crucial importance withstand constant light illumination thermal stress while maintaining high charge extractability. Herein, commonly used SnO 2 nanoparticle‐based ETL for cells modified by ionic‐salt ammonium chloride (NH 4 Cl) tin dihydrate (SnCl ∙2H O) as additives, which easily fabricated simple one‐step spin coating single precursor solution. With the presence...

10.1002/admi.202202148 article EN cc-by Advanced Materials Interfaces 2023-03-03

Recent studies have shown promising results on utilizing large pre-trained image-language models for video question answering. While these can efficiently bootstrap the representation learning of video-language models, they typically concatenate uniformly sampled frames as visual inputs without explicit language-aware, temporal modeling. When only a portion input is relevant to language query, such uniform frame sampling often lead missing important cues. Although humans find moment focus...

10.48550/arxiv.2305.06988 preprint EN other-oa arXiv (Cornell University) 2023-01-01

There is growing interest in searching for information from large video corpora. Prior works have studied relevant tasks, such as text-based retrieval, moment summarization, and captioning isolation, without an end-to-end setup that can jointly search corpora generate summaries. Such would allow many interesting applications, e.g., a finds corpus, extracts the most video, segments into important steps with captions. To address this, we present HIREST (HIerarchical REtrieval STep-captioning)...

10.1109/cvpr52729.2023.02208 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Plastic contamination is a global pervasive issue, extending from coastal areas and open oceans to polar regions even the deep sea. Microplastic (MP) in hydrothermal vents, which are known for their high biodiversity under extreme conditions, has remained largely unexplored. Here, we present, first time, MP pollution deep-sea vent at one of hotspots─the Central Indian Ridge. Not only environment (seawater: 2.08 ± 1.04 MPs/L, surface sediments: 0.57 0.19 MP/g) but also all six major benthic...

10.1021/acs.est.4c02811 article EN Environmental Science & Technology 2024-04-17

High-mobility inorganic CuCrO2 nanoparticles are co-utilized with conventional poly(bis(4-phenyl)(2,5,6-trimethylphenyl)amine) (PTAA) as a hole transport layer (HTL) for perovskite solar cells to improve device performance and long-term stability. Even though can be readily synthesized by hydrothermal reaction, it is difficult form uniform HTL alone due the severe agglomeration of nanoparticles. Herein, both PTAA sequentially deposited on simple spin-coating process, forming excellent...

10.3390/nano10091669 article EN cc-by Nanomaterials 2020-08-26

ABSTRACT Seeds contaminated with pathogens are the primary inoculum for plant diseases in many food crops. Conventional treatments seedborne use hot water, chlorine or fungicide applications. A novel seed treatment method based on non‐thermal plasma generated by an air dielectric barrier discharge (DBD) device was evaluated this study as alternative to these conventional treatments. The at atmospheric pressure and room temperature consisted of partially‐ionized gases that chemically...

10.2135/cropsci2013.05.0331 article EN Crop Science 2014-02-27

Because of the facile formation defects in organometal halide perovskites, defect passivation has become an important prerequisite for stable and efficient perovskite solar cell (PSC). Regarding that ionic perovskites play a significant role on performance stability PSCs, we introduce lithium fluorides as effective passivators based their strong characteristics small radii. Both Li+ F– are observed to successfully incorporate within layer, improving device performances with best efficiency...

10.1021/acsami.0c14218 article EN ACS Applied Materials & Interfaces 2020-10-29

A direct carbon fuel cell (DCFC) system directly converts the chemical energy of solid carbonaceous into electrical energy. The electrochemical reaction this has an influence on properties fuel, such as crystal structure, element composition, and surface properties. In addition, when using raw coals DCFC volatile gases released from coal at a high temperature affect performance. purpose study is to investigate effect characteristics resistance inner by impedance spectroscopy (EIS) equivalent...

10.1021/acs.energyfuels.5b02904 article EN Energy & Fuels 2016-03-05

Recently, DALL-E, a multimodal transformer language model, and its variants, including diffusion models, have shown high-quality text-to-image generation capabilities. However, despite the realistic image results, there has not been detailed analysis of how to evaluate such models. In this work, we investigate visual reasoning capabilities social biases different covering both models First, measure three skills: object recognition, counting, spatial relation understanding. For this, propose...

10.48550/arxiv.2202.04053 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Existing methods for vision-and-language learning typically require designing task-specific architectures and objectives each task. For example, a multi-label answer classifier visual question answering, region scorer referring expression comprehension, language decoder image captioning, etc. To alleviate these hassles, in this work, we propose unified framework that learns different tasks single architecture with the same modeling objective, i.e., multimodal conditional text generation,...

10.48550/arxiv.2102.02779 preprint EN other-oa arXiv (Cornell University) 2021-01-01

10.5220/0013145500003912 article EN Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications 2025-01-01

Recently, there has been an increasing interest in building question answering (QA) models that reason across multiple modalities, such as text and images. However, QA using images is often limited to just picking the answer from a pre-defined set of options. In addition, real world, especially news, have objects are co-referential text, with complementary information both modalities. this paper, we present new evaluation benchmark 1,384 questions over news articles require cross-media...

10.1609/aaai.v36i10.21370 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28
Coming Soon ...