- Advanced Data Storage Technologies
- Parallel Computing and Optimization Techniques
- Caching and Content Delivery
- Cloud Computing and Resource Management
- Advanced Image and Video Retrieval Techniques
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Distributed systems and fault tolerance
- Distributed and Parallel Computing Systems
- Topic Modeling
- Image Retrieval and Classification Techniques
- Mathematics, Computing, and Information Processing
- Data Stream Mining Techniques
- Algorithms and Data Compression
- Data Management and Algorithms
- Health Sciences Research and Education
- Semantic Web and Ontologies
- Advanced Neural Network Applications
- Cellular Automata and Applications
- Text Readability and Simplification
- Electronic Health Records Systems
- Biomedical Text Mining and Ontologies
- Recommender Systems and Techniques
- Explainable Artificial Intelligence (XAI)
Seoul National University
2024-2025
Amazon (United States)
2022-2023
Amazon (Germany)
2023
Microsoft Research (United Kingdom)
2020-2022
Microsoft (United States)
2013-2021
University of Wisconsin–Madison
2009-2013
We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough problem sizes arise in practice. Our Fast Statistical Alignment based on pair hidden Markov models which approximate an insertion/deletion process tree uses sequence annealing algorithm to combine posterior probabilities estimated from these into alignment. FSA its explicit statistical model produce alignments are accompanied by estimates accuracy uncertainty...
Data storage devices are getting "smarter." Smart Flash (a.k.a. "Smart SSD") on the horizon and will package CPU processing DRAM inside a SSD, make that available to run user programs SSD. The focus of this paper is exploring opportunities challenges associated with exploiting functionality SSDs for relational analytic query processing. We have implemented an initial prototype Microsoft SQL Server running Samsung Our results demonstrate significant performance energy gains can be achieved by...
Recent work on "learned indexes" has changed the way we look at decades-old field of DBMS indexing. The key idea is that indexes can be thought as "models" predict position a in dataset. Indexes can, thus, learned. original by Kraska et al. shows learned index beats B+Tree factor up to three search time and an order magnitude memory footprint. However, it limited static, read-only workloads. In this paper, present new called ALEX which addresses practical issues arise when implementing for...
Flash solid-state drives (SSDs) are changing the I/O landscape, which has largely been dominated by traditional hard disk (HDDs) for last 50 years. In this paper we propose and systematically explore designs using an SSD to improve performance of a DBMS buffer manager. We three alternatives that differ mainly in way they deal with dirty pages evicted from pool. implemented these alternatives, as well another recently proposed algorithm task (TAC), SQL Server, ran experiments variety...
The growing volume of data produced continuously in the Cloud and at Edge poses significant challenges for large-scale AI applications to extract learn useful information from a timely efficient way. goal this article is explore use computational storage address such by distributed near-data processing. We describe Newport, high-performance energy-efficient developed realizing full potential in-storage To best our knowledge, Newport first commodity SSD that can be configured run server-like...
In various academic and professional settings, such as mathematics lectures or research presentations, it is often necessary to convey mathematical expressions orally. However, reading aloud without accompanying visuals can significantly hinder comprehension, especially for those who are hearing-impaired rely on subtitles due language barriers. For instance, when a presenter reads Euler's Formula, current Automatic Speech Recognition (ASR) models produce verbose error-prone textual...
Programmable software-defined solid-state drives can move computing functions closer to storage.
K-nearest neighbor search is one of the fundamental tasks in various applications and hierarchical navigable small world (HNSW) has recently drawn attention large-scale cloud services, as it easily scales up database while offering fast search. On other hand, a computational storage device (CSD) that combines programmable logic modules on single board becomes popular to address data bandwidth bottleneck modern computing systems. In this paper, we propose platform can accelerate graph-based...
Non-volatile memory (NVM) is an emerging technology, which has the persistence characteristics of large capacity storage devices, while providing low access latency and byte-addressablity traditional DRAM memory. In this paper, we provide extensive performance evaluations on a recently released NVM device, Intel Optane DC Persistent Memory (PMem), under different configurations with several micro-benchmark tools. Further, evaluate OLTP OLAP database workloads Microsoft SQL Server 2019 when...
Flash solid state drives (SSDs) provide an attractive alternative to traditional magnetic hard disk (HDDs) for DBMS applications. Naturally there is substantial interest in redesigning critical database internals, such as join algorithms, flash SSDs. However, we must carefully consider the lessons that have learnt from over three decades of designing and tuning algorithms HDD-based systems, so continue reuse techniques worked HDDs also work with
Referring image segmentation aims to localize the object in an referred by a natural language expression. Most previous studies learn referring with large-scale dataset containing labels, but they are costly. We present weakly supervised learning method for that only uses readily available image-text pairs. first train visual-linguistic model matching and extract visual saliency map through Grad-CAM identify regions corresponding each word. However, we found two major problems Grad-CAM....
A promising use of flash SSDs in a DBMS is to extend the main memory buffer pool by caching selected pages that have been evicted from pool. Such has shown produce significant gains steady state performance DBMS. One strategy for using SSD throw away data when system restarted (either recovering crash or restarting after shutdown), and consequently long “ramp-up” period regain peak needed. approach eliminate this limitation memory-mapped file store table order be able restore its contents on...
Jointly fine-tuning a Pre-trained Language Model (PLM) on pre-defined set of tasks with in-context instructions has been proven to improve its generalization performance, allowing us build universal language model that can be deployed across task boundaries. In this work, we explore for the first time whether attractive property instruction learning extended scenario in which are fed target PLM sequential manner. The primary objective so-called lifelong is PLM’s instance- and task-level...
Concept-based explanation aims to provide concise and human-understandable explanations of an image classifier. However, existing concept-based methods typically require a significant amount manually collected concept-annotated images. This is costly runs the risk human biases being involved in explanation. In this paper, we propose Counterfactual with text-driven concepts (CounTEX), where are defined only from text by leveraging pretrained multimodal joint embedding space without additional...
Graph neural networks (GNNs) have achieved remarkable success in recommender systems by representing users and items based on their historical interactions. However, little attention was paid to GNN's vulnerability exposure bias: are exposed a limited number of so that system only learns biased view user preference result suboptimal recommendation quality. Although inverse propensity weighting is known recognize alleviate bias, it usually works the final objective with model outputs, whereas...
Exploiting a storage hierarchy is critical to cost-effective data management. One can achieve great performance when working solely on main memory data. But this comes at high cost. Systems that use secondary as the "home" for have much lower costs they not only make durable but reduce its cost well. Performance then becomes challenge, reflected in an increased execution Log structured stores, e.g. Deuteronomy, improve I/O cost/performance by batching writes. However, incurs of host-based...
In many settings, a database server has to be restarted either in response failure event, or an operational decision such as moving service from one machine another. However, restarts pose potential performance problem the new starts off with cold buffer pool. As result, application experiences dramatic reduction right after restart, since just before restart pool was filled hot pages and is empty. To address these issues, traditional systems use mechanisms SQL Server's aggressive page...
Non-volatile memory (NVM) is an emerging technology, which has the persistence characteristics of large capacity storage devices(e.g., HDDs and SSDs), while providing low access latency byte-addressablity traditional DRAM memory. This unique combination features open up several new design considerations when building database management systems (DBMSs), such as replacing (as main working space memory) or block devices persistent storage), complementing both at same time for DBMS components...
While clinical notes are essential to the field of healthcare, they pose several challenges for clinicians since it is difficult write down medical information, review prior notes, and extract desired information at same time while examining a patient. Thus, we designed system that can automatically generate from dialogues between patients provide specific upon clinicians' query using Large Language Model (LLM) both in real-time. To explore how this be used support practice, conducted an...
Improving the readability of mathematical expressions in text-based document such as subtitle video, is an significant task. To achieve this, should be convert to compiled formulas. For instance, spoken expression ``x equals minus b plus or square root squared four a c, all over two a'' from automatic speech recognition more readily comprehensible when displayed formula $x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$. sentences formulas, processes are required: converted into LaTeX and formulas The...
In various academic and professional settings, such as mathematics lectures or research presentations, it is often necessary to convey mathematical expressions orally. However, reading aloud without accompanying visuals can significantly hinder comprehension, especially for those who are hearing-impaired rely on subtitles due language barriers. For instance, when a presenter reads Euler's Formula, current Automatic Speech Recognition (ASR) models produce verbose error-prone textual...