Jaeyoung Do

ORCID: 0000-0003-1275-1621
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Data Storage Technologies
  • Parallel Computing and Optimization Techniques
  • Caching and Content Delivery
  • Cloud Computing and Resource Management
  • Advanced Image and Video Retrieval Techniques
  • Natural Language Processing Techniques
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • Distributed systems and fault tolerance
  • Distributed and Parallel Computing Systems
  • Topic Modeling
  • Image Retrieval and Classification Techniques
  • Mathematics, Computing, and Information Processing
  • Data Stream Mining Techniques
  • Algorithms and Data Compression
  • Data Management and Algorithms
  • Health Sciences Research and Education
  • Semantic Web and Ontologies
  • Advanced Neural Network Applications
  • Cellular Automata and Applications
  • Text Readability and Simplification
  • Electronic Health Records Systems
  • Biomedical Text Mining and Ontologies
  • Recommender Systems and Techniques
  • Explainable Artificial Intelligence (XAI)

Seoul National University
2024-2025

Amazon (United States)
2022-2023

Amazon (Germany)
2023

Microsoft Research (United Kingdom)
2020-2022

Microsoft (United States)
2013-2021

University of Wisconsin–Madison
2009-2013

We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough problem sizes arise in practice. Our Fast Statistical Alignment based on pair hidden Markov models which approximate an insertion/deletion process tree uses sequence annealing algorithm to combine posterior probabilities estimated from these into alignment. FSA its explicit statistical model produce alignments are accompanied by estimates accuracy uncertainty...

10.1371/journal.pcbi.1000392 article EN cc-by PLoS Computational Biology 2009-05-28

Data storage devices are getting "smarter." Smart Flash (a.k.a. "Smart SSD") on the horizon and will package CPU processing DRAM inside a SSD, make that available to run user programs SSD. The focus of this paper is exploring opportunities challenges associated with exploiting functionality SSDs for relational analytic query processing. We have implemented an initial prototype Microsoft SQL Server running Samsung Our results demonstrate significant performance energy gains can be achieved by...

10.1145/2463676.2465295 article EN 2013-06-22

Recent work on "learned indexes" has changed the way we look at decades-old field of DBMS indexing. The key idea is that indexes can be thought as "models" predict position a in dataset. Indexes can, thus, learned. original by Kraska et al. shows learned index beats B+Tree factor up to three search time and an order magnitude memory footprint. However, it limited static, read-only workloads. In this paper, present new called ALEX which addresses practical issues arise when implementing for...

10.1145/3318464.3389711 preprint EN 2020-05-29

Flash solid-state drives (SSDs) are changing the I/O landscape, which has largely been dominated by traditional hard disk (HDDs) for last 50 years. In this paper we propose and systematically explore designs using an SSD to improve performance of a DBMS buffer manager. We three alternatives that differ mainly in way they deal with dirty pages evicted from pool. implemented these alternatives, as well another recently proposed algorithm task (TAC), SQL Server, ran experiments variety...

10.1145/1989323.1989442 article EN 2011-06-12

The growing volume of data produced continuously in the Cloud and at Edge poses significant challenges for large-scale AI applications to extract learn useful information from a timely efficient way. goal this article is explore use computational storage address such by distributed near-data processing. We describe Newport, high-performance energy-efficient developed realizing full potential in-storage To best our knowledge, Newport first commodity SSD that can be configured run server-like...

10.1145/3415580 article EN ACM Transactions on Storage 2020-10-12

10.1109/icassp49660.2025.10890531 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

In various academic and professional settings, such as mathematics lectures or research presentations, it is often necessary to convey mathematical expressions orally. However, reading aloud without accompanying visuals can significantly hinder comprehension, especially for those who are hearing-impaired rely on subtitles due language barriers. For instance, when a presenter reads Euler's Formula, current Automatic Speech Recognition (ASR) models produce verbose error-prone textual...

10.1609/aaai.v39i23.34595 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Programmable software-defined solid-state drives can move computing functions closer to storage.

10.1145/3286588 article EN Communications of the ACM 2019-05-21

K-nearest neighbor search is one of the fundamental tasks in various applications and hierarchical navigable small world (HNSW) has recently drawn attention large-scale cloud services, as it easily scales up database while offering fast search. On other hand, a computational storage device (CSD) that combines programmable logic modules on single board becomes popular to address data bandwidth bottleneck modern computing systems. In this paper, we propose platform can accelerate graph-based...

10.1109/tc.2022.3155956 article EN IEEE Transactions on Computers 2022-03-03

Non-volatile memory (NVM) is an emerging technology, which has the persistence characteristics of large capacity storage devices, while providing low access latency and byte-addressablity traditional DRAM memory. In this paper, we provide extensive performance evaluations on a recently released NVM device, Intel Optane DC Persistent Memory (PMem), under different configurations with several micro-benchmark tools. Further, evaluate OLTP OLAP database workloads Microsoft SQL Server 2019 when...

10.1145/3399666.3399898 article EN 2020-06-04

Flash solid state drives (SSDs) provide an attractive alternative to traditional magnetic hard disk (HDDs) for DBMS applications. Naturally there is substantial interest in redesigning critical database internals, such as join algorithms, flash SSDs. However, we must carefully consider the lessons that have learnt from over three decades of designing and tuning algorithms HDD-based systems, so continue reuse techniques worked HDDs also work with

10.1145/1565694.1565696 article EN 2009-06-28

Referring image segmentation aims to localize the object in an referred by a natural language expression. Most previous studies learn referring with large-scale dataset containing labels, but they are costly. We present weakly supervised learning method for that only uses readily available image-text pairs. first train visual-linguistic model matching and extract visual saliency map through Grad-CAM identify regions corresponding each word. However, we found two major problems Grad-CAM....

10.1109/iccv51070.2023.01999 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

A promising use of flash SSDs in a DBMS is to extend the main memory buffer pool by caching selected pages that have been evicted from pool. Such has shown produce significant gains steady state performance DBMS. One strategy for using SSD throw away data when system restarted (either recovering crash or restarting after shutdown), and consequently long “ramp-up” period regain peak needed. approach eliminate this limitation memory-mapped file store table order be able restore its contents on...

10.1109/icde.2013.6544903 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2013-04-01

Jointly fine-tuning a Pre-trained Language Model (PLM) on pre-defined set of tasks with in-context instructions has been proven to improve its generalization performance, allowing us build universal language model that can be deployed across task boundaries. In this work, we explore for the first time whether attractive property instruction learning extended scenario in which are fed target PLM sequential manner. The primary objective so-called lifelong is PLM’s instance- and task-level...

10.18653/v1/2023.acl-long.703 article EN cc-by 2023-01-01

Concept-based explanation aims to provide concise and human-understandable explanations of an image classifier. However, existing concept-based methods typically require a significant amount manually collected concept-annotated images. This is costly runs the risk human biases being involved in explanation. In this paper, we propose Counterfactual with text-driven concepts (CounTEX), where are defined only from text by leveraging pretrained multimodal joint embedding space without additional...

10.1109/cvpr52729.2023.01053 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Graph neural networks (GNNs) have achieved remarkable success in recommender systems by representing users and items based on their historical interactions. However, little attention was paid to GNN's vulnerability exposure bias: are exposed a limited number of so that system only learns biased view user preference result suboptimal recommendation quality. Although inverse propensity weighting is known recognize alleviate bias, it usually works the final objective with model outputs, whereas...

10.1145/3511808.3557576 article EN Proceedings of the 31st ACM International Conference on Information & Knowledge Management 2022-10-16

Exploiting a storage hierarchy is critical to cost-effective data management. One can achieve great performance when working solely on main memory data. But this comes at high cost. Systems that use secondary as the "home" for have much lower costs they not only make durable but reduce its cost well. Performance then becomes challenge, reflected in an increased execution Log structured stores, e.g. Deuteronomy, improve I/O cost/performance by batching writes. However, incurs of host-based...

10.1145/3329785.3329925 article EN 2019-06-24

In many settings, a database server has to be restarted either in response failure event, or an operational decision such as moving service from one machine another. However, restarts pose potential performance problem the new starts off with cold buffer pool. As result, application experiences dramatic reduction right after restart, since just before restart pool was filled hot pages and is empty. To address these issues, traditional systems use mechanisms SQL Server's aggressive page...

10.1109/icdew.2016.7495612 article EN 2016-05-01

Non-volatile memory (NVM) is an emerging technology, which has the persistence characteristics of large capacity storage devices(e.g., HDDs and SSDs), while providing low access latency byte-addressablity traditional DRAM memory. This unique combination features open up several new design considerations when building database management systems (DBMSs), such as replacing (as main working space memory) or block devices persistent storage), complementing both at same time for DBMS components...

10.48550/arxiv.2005.07658 preprint EN other-oa arXiv (Cornell University) 2020-01-01

While clinical notes are essential to the field of healthcare, they pose several challenges for clinicians since it is difficult write down medical information, review prior notes, and extract desired information at same time while examining a patient. Thus, we designed system that can automatically generate from dialogues between patients provide specific upon clinicians' query using Large Language Model (LLM) both in real-time. To explore how this be used support practice, conducted an...

10.1145/3613905.3650784 article EN 2024-05-11

Improving the readability of mathematical expressions in text-based document such as subtitle video, is an significant task. To achieve this, should be convert to compiled formulas. For instance, spoken expression ``x equals minus b plus or square root squared four a c, all over two a'' from automatic speech recognition more readily comprehensible when displayed formula $x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$. sentences formulas, processes are required: converted into LaTeX and formulas The...

10.48550/arxiv.2408.07081 preprint EN arXiv (Cornell University) 2024-08-07

In various academic and professional settings, such as mathematics lectures or research presentations, it is often necessary to convey mathematical expressions orally. However, reading aloud without accompanying visuals can significantly hinder comprehension, especially for those who are hearing-impaired rely on subtitles due language barriers. For instance, when a presenter reads Euler's Formula, current Automatic Speech Recognition (ASR) models produce verbose error-prone textual...

10.48550/arxiv.2412.15655 preprint EN arXiv (Cornell University) 2024-12-20
Coming Soon ...