- Multimodal Machine Learning Applications
- Topic Modeling
- Domain Adaptation and Few-Shot Learning
- Advanced Image and Video Retrieval Techniques
- Natural Language Processing Techniques
- Video Analysis and Summarization
- Perovskite Materials and Applications
- Fuel Cells and Related Materials
- Robotics and Sensor-Based Localization
- Computer Graphics and Visualization Techniques
- Chalcogenide Semiconductor Thin Films
- Human Motion and Animation
- Advancements in Solid Oxide Fuel Cells
- Speech and dialogue systems
- Corporate Finance and Governance
- Private Equity and Venture Capital
- Recycling and Waste Management Techniques
- Conducting polymers and applications
- Integrated Energy Systems Optimization
- Advanced Neural Network Applications
- Reinforcement Learning in Robotics
- Embedded Systems Design Techniques
- Advancements in Photolithography Techniques
- Visual Attention and Saliency Detection
- Quantum Dots Synthesis And Properties
Korea Advanced Institute of Science and Technology
2021-2025
University of North Carolina at Chapel Hill
2020-2024
University of North Carolina Health Care
2020-2024
Korea University of Technology and Education
2024
Inha University
2022-2024
Korea Institute of Ocean Science and Technology
2024
Seoul National University
2016-2023
Pohang University of Science and Technology
2012-2020
University of Washington
2020
Naver (South Korea)
2019
Recently, fine-tuning language models pre-trained on large text corpora have provided huge improvements vision-and-language (V&L) tasks as well pure tasks. However, the entire parameter set of becomes impractical since model size is growing rapidly. Hence, in this paper, we introduce adapter-based parameter-efficient transfer learning techniques to V&L such VL-BART and VL-T5. We evaluate our methods a unified multi-task setup both image-text video-text benchmarks. For tasks, use four diverse...
Recently, DALL-E [45], a multimodal transformer language model, and its variants including diffusion models have shown high-quality text-to-image generation capabilities. However, despite the realistic image results, there has not been detailed analysis of how to evaluate such models. In this work, we investigate visual reasoning capabilities social biases different models, covering both First, measure three skills: object recognition, counting, spatial relation understanding. For this,...
Yookoon Park, Jaemin Cho, Gunhee Kim. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.
Mirroring the success of masked language models, vision-and-language counterparts like VILBERT, LXMERT and UNITER have achieved state art performance on a variety multimodal discriminative tasks visual question answering grounding. Recent work has also successfully adapted such models towards generative task image captioning. This begs question: Can these go other way generate images from pieces text? Our analysis popular representative this model family – finds that it is unable to rich...
Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of domains recently. However, it is costly to update the entire parameter set models. Although recently proposed parameter-efficient transfer learning (PETL) techniques allow updating small subset parameters (e.g. only using 2% parameters) inside backbone network for new task, they reduce training memory requirement by up 30%. This because gradient computation trainable still requires backpropagation...
Jaemin Cho, Minjoon Seo, Hannaneh Hajishirzi. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
Modern image captioning models are usually trained with text similarity objectives. However, since reference captions in public datasets often describe the most salient common objects, objectives tend to ignore specific and detailed aspects of an that distinguish it from others. Towards more descriptive distinctive caption generation, we propose use CLIP, a multimodal encoder on huge image-text pairs web, calculate multi-modal as reward function. We also simple finetuning strategy CLIP...
Abstract In order to realize both efficient and stable perovskite solar cells, designing electron transport layer (ETL) is of crucial importance withstand constant light illumination thermal stress while maintaining high charge extractability. Herein, commonly used SnO 2 nanoparticle‐based ETL for cells modified by ionic‐salt ammonium chloride (NH 4 Cl) tin dihydrate (SnCl ∙2H O) as additives, which easily fabricated simple one‐step spin coating single precursor solution. With the presence...
Recent studies have shown promising results on utilizing large pre-trained image-language models for video question answering. While these can efficiently bootstrap the representation learning of video-language models, they typically concatenate uniformly sampled frames as visual inputs without explicit language-aware, temporal modeling. When only a portion input is relevant to language query, such uniform frame sampling often lead missing important cues. Although humans find moment focus...
There is growing interest in searching for information from large video corpora. Prior works have studied relevant tasks, such as text-based retrieval, moment summarization, and captioning isolation, without an end-to-end setup that can jointly search corpora generate summaries. Such would allow many interesting applications, e.g., a finds corpus, extracts the most video, segments into important steps with captions. To address this, we present HIREST (HIerarchical REtrieval STep-captioning)...
Plastic contamination is a global pervasive issue, extending from coastal areas and open oceans to polar regions even the deep sea. Microplastic (MP) in hydrothermal vents, which are known for their high biodiversity under extreme conditions, has remained largely unexplored. Here, we present, first time, MP pollution deep-sea vent at one of hotspots─the Central Indian Ridge. Not only environment (seawater: 2.08 ± 1.04 MPs/L, surface sediments: 0.57 0.19 MP/g) but also all six major benthic...
High-mobility inorganic CuCrO2 nanoparticles are co-utilized with conventional poly(bis(4-phenyl)(2,5,6-trimethylphenyl)amine) (PTAA) as a hole transport layer (HTL) for perovskite solar cells to improve device performance and long-term stability. Even though can be readily synthesized by hydrothermal reaction, it is difficult form uniform HTL alone due the severe agglomeration of nanoparticles. Herein, both PTAA sequentially deposited on simple spin-coating process, forming excellent...
ABSTRACT Seeds contaminated with pathogens are the primary inoculum for plant diseases in many food crops. Conventional treatments seedborne use hot water, chlorine or fungicide applications. A novel seed treatment method based on non‐thermal plasma generated by an air dielectric barrier discharge (DBD) device was evaluated this study as alternative to these conventional treatments. The at atmospheric pressure and room temperature consisted of partially‐ionized gases that chemically...
Because of the facile formation defects in organometal halide perovskites, defect passivation has become an important prerequisite for stable and efficient perovskite solar cell (PSC). Regarding that ionic perovskites play a significant role on performance stability PSCs, we introduce lithium fluorides as effective passivators based their strong characteristics small radii. Both Li+ F– are observed to successfully incorporate within layer, improving device performances with best efficiency...
A direct carbon fuel cell (DCFC) system directly converts the chemical energy of solid carbonaceous into electrical energy. The electrochemical reaction this has an influence on properties fuel, such as crystal structure, element composition, and surface properties. In addition, when using raw coals DCFC volatile gases released from coal at a high temperature affect performance. purpose study is to investigate effect characteristics resistance inner by impedance spectroscopy (EIS) equivalent...
Recently, DALL-E, a multimodal transformer language model, and its variants, including diffusion models, have shown high-quality text-to-image generation capabilities. However, despite the realistic image results, there has not been detailed analysis of how to evaluate such models. In this work, we investigate visual reasoning capabilities social biases different covering both models First, measure three skills: object recognition, counting, spatial relation understanding. For this, propose...
Existing methods for vision-and-language learning typically require designing task-specific architectures and objectives each task. For example, a multi-label answer classifier visual question answering, region scorer referring expression comprehension, language decoder image captioning, etc. To alleviate these hassles, in this work, we propose unified framework that learns different tasks single architecture with the same modeling objective, i.e., multimodal conditional text generation,...
Recently, there has been an increasing interest in building question answering (QA) models that reason across multiple modalities, such as text and images. However, QA using images is often limited to just picking the answer from a pre-defined set of options. In addition, real world, especially news, have objects are co-referential text, with complementary information both modalities. this paper, we present new evaluation benchmark 1,384 questions over news articles require cross-media...