NFDI4DS | UHH-SEMS - Publication Details

Jinhyuk Lee

ORCID: 0000-0003-4972-239X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5002413587

Research Areas

Topic Modeling
Natural Language Processing Techniques
Multimodal Machine Learning Applications
Protein Structure and Dynamics
Biomedical Text Mining and Ontologies
Computational Drug Discovery Methods
RNA and protein synthesis mechanisms
Lipid Membrane Structure and Behavior
Enzyme Structure and Function
Expert finding and Q&A systems
Information Retrieval and Search Behavior
Force Microscopy Techniques and Applications
Text and Document Classification Technologies
Conferences and Exhibitions Management
Economic theories and models
DNA and Nucleic Acid Chemistry
Domain Adaptation and Few-Shot Learning
melanin and skin pigmentation
Interpreting and Communication in Healthcare
Artificial Intelligence in Healthcare and Education
Advanced Text Analysis Techniques
scientometrics and bibliometrics research
Consumer Market Behavior and Pricing
Molecular spectroscopy and chirality
Machine Learning in Bioinformatics

Korea University
2016-2024

Princeton University
2021-2023

Kyungpook National University
2023

Google (United States)
2023

Korea Research Institute of Bioscience and Biotechnology
2013-2022

Korea University of Science and Technology
2013-2022

Korea Institute of Oriental Medicine
2021

Icahn School of Medicine at Mount Sinai
2021

Bar-Ilan University
2021

University of Washington
2019-2020

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

OPENALEX - Publications

Jinhyuk Lee Wonjin Yoon Sungdong Kim Donghyeon Kim Sunkyu Kim and 2 more

Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With progress in natural language processing (NLP), extracting valuable information from literature has gained popularity among researchers, and deep learning boosted development effective models. However, directly applying advancements NLP to often yields unsatisfactory results due a word distribution shift general domain corpora corpora. In this article, we investigate how...

10.1093/bioinformatics/btz682 article EN Bioinformatics 2019-09-05

BERN2: an advanced neural biomedical named entity recognition and normalization tool

OPENALEX - Publications

Mujeen Sung Minbyul Jeong Yonghwa Choi Donghyeon Kim Jinhyuk Lee and 1 more

In biomedical natural language processing, named entity recognition (NER) and normalization (NEN) are key tasks that enable the automatic extraction of entities (e.g. diseases drugs) from ever-growing literature. this article, we present BERN2 (Advanced Biomedical Entity Recognition Normalization), a tool improves previous neural network-based NER by employing multi-task model NEN models to achieve much faster more accurate inference. We hope our can help annotate large-scale texts for...

10.1093/bioinformatics/btac598 article EN Bioinformatics 2022-08-31

Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index

OPENALEX - Publications

Minjoon Seo Jinhyuk Lee Tom Kwiatkowski Ankur P. Parikh Ali Farhadi and 1 more

Existing open-domain question answering (QA) models are not suitable for real-time usage because they need to process several long documents on-demand every input query, which is computationally prohibitive. In this paper, we introduce query-agnostic indexable representations of document phrases that can drastically speed up QA. particular, our dense-sparse phrase encoding effectively captures syntactic, semantic, and lexical information the eliminates pipeline filtering context documents....

10.18653/v1/p19-1436 preprint EN 2019-01-01

A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining

OPENALEX - Publications

Donghyeon Kim Jinhyuk Lee Chan Ho So Hwisang Jeon Minbyul Jeong and 4 more

The amount of biomedical literature is vast and growing quickly, accurate text mining techniques could help researchers to efficiently extract useful information from the literature. However, existing named entity recognition models used by tools such as tmTool ezTag are not effective enough, cannot accurately discover new entities. Also, traditional do consider overlapping entities, which frequently observed in multi-type results. We propose a neural normalization tool called BERN. BERN...

10.1109/access.2019.2920708 article EN cc-by-nc-nd IEEE Access 2019-01-01

Biomedical Entity Representations with Synonym Marginalization

OPENALEX - Publications

Mujeen Sung Hwisang Jeon Jinhyuk Lee Jaewoo Kang

Biomedical named entities often play important roles in many biomedical text mining tools. However, due to the incompleteness of provided synonyms and numerous variations their surface forms, normalization is very challenging. In this paper, we focus on learning representations solely based entities. To learn from incomplete synonyms, use a model-based candidate selection maximize marginal likelihood present top candidates. Our candidates are iteratively updated contain more difficult...

10.18653/v1/2020.acl-main.335 article EN cc-by 2020-01-01

CollaboNet: collaboration of deep neural networks for biomedical named entity recognition

OPENALEX - Publications

Wonjin Yoon Chan Ho So Jinhyuk Lee Jaewoo Kang

Finding biomedical named entities is one of the most essential tasks in text mining. Recently, deep learning-based approaches have been applied to entity recognition (BioNER) and showed promising results. However, as learning need an abundant amount training data, a lack data can hinder performance. BioNER datasets are scarce resources each dataset covers only small subset types. Furthermore, many bio polysemous, which major obstacles recognition. To address type misclassification problem,...

10.1186/s12859-019-2813-6 article EN cc-by BMC Bioinformatics 2019-05-01

Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering

OPENALEX - Publications

Jinhyuk Lee Seongjun Yun Hyunjae Kim Miyoung Ko Jaewoo Kang

Recently, open-domain question answering (QA) has been combined with machine comprehension models to find answers in a large knowledge source. As QA requires retrieving relevant documents from text corpora answer questions, its performance largely depends on the of document retrievers. However, since traditional information retrieval systems are not effective obtaining high probability containing answers, they lower systems. Simply extracting more increases number irrelevant documents, which...

10.18653/v1/d18-1053 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

Learning Dense Representations of Phrases at Scale

OPENALEX - Publications

Jinhyuk Lee Mujeen Sung Jaewoo Kang Danqi Chen

Jinhyuk Lee, Mujeen Sung, Jaewoo Kang, Danqi Chen. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.518 article EN cc-by 2021-01-01

Simple Entity-Centric Questions Challenge Dense Retrievers

OPENALEX - Publications

Christopher Sciavolino Zexuan Zhong Jinhyuk Lee Danqi Chen

Open-domain question answering has exploded in popularity recently due to the success of dense retrieval models, which have surpassed sparse models using only a few supervised training examples. However, this paper, we demonstrate current are not yet holy grail retrieval. We first construct EntityQuestions, set simple, entity-rich questions based on facts from Wikidata (e.g., "Where was Arve Furset born?"), and observe that retrievers drastically under-perform methods. investigate issue...

10.18653/v1/2021.emnlp-main.496 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

Can Language Models be Biomedical Knowledge Bases?

OPENALEX - Publications

Mujeen Sung Jinhyuk Lee Sean S. Yi Minji Jeon Sungdong Kim and 1 more

Pre-trained language models (LMs) have become ubiquitous in solving various natural processing (NLP) tasks. There has been increasing interest what knowledge these LMs contain and how we can extract that knowledge, treating as bases (KBs). While there much work on probing the general domain, little attention to whether powerful be used domain-specific KBs. To this end, create BioLAMA benchmark, which is comprised of 49K biomedical factual triples for LMs. We find with recently proposed...

10.18653/v1/2021.emnlp-main.388 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

ReSimNet: drug response similarity prediction using Siamese neural networks

OPENALEX - Publications

Minji Jeon Donghyeon Park Jinhyuk Lee Hwisang Jeon Miyoung Ko and 4 more

Abstract Motivation Traditional drug discovery approaches identify a target for disease and find compound that binds to the target. In this approach, structures of compounds are considered as most important features because it is assumed similar will bind same Therefore, structural analogs drugs selected candidates. However, even though not analogs, they may achieve desired response. A new method based on response, which can complement structure-based methods, needed. Results We implemented...

10.1093/bioinformatics/btz411 article EN Bioinformatics 2019-05-16

Answering Questions on COVID-19 in Real-Time

OPENALEX - Publications

Jinhyuk Lee Sean S. Yi Minbyul Jeong Mujeen Sung Wonjin Yoon and 3 more

The recent outbreak of the novel coronavirus is wreaking havoc on world and researchers are struggling to effectively combat it. One reason why fight difficult due lack information knowledge. In this work, we outline our effort contribute shrinking knowledge vacuum by creating covidAsk, a question answering (QA) system that combines biomedical text mining QA techniques provide answers questions in real-time. Our also leverages retrieval (IR) approaches entity-level complementary models....

10.18653/v1/2020.nlpcovid19-2.1 article EN cc-by 2020-01-01

Look at the First Sentence: Position Bias in Question Answering

OPENALEX - Publications

Miyoung Ko Jinhyuk Lee Hyunjae Kim Gangwoo Kim Jaewoo Kang

Many extractive question answering models are trained to predict start and end positions of answers. The choice predicting answers as is mainly due its simplicity effectiveness. In this study, we hypothesize that when the distribution answer highly skewed in training set (e.g., lie only k-th sentence each passage), QA can learn spurious positional cues fail give different positions. We first illustrate position bias popular such BiDAF BERT thoroughly examine how propagates through layer...

10.18653/v1/2020.emnlp-main.84 article EN cc-by 2020-01-01

Transmembrane Helix Tilting: Insights from Calculating the Potential of Mean Force

OPENALEX - Publications

Jinhyuk Lee Wonpil Im

To explore the microscopic forces governing helix tilting in membranes, we have calculated potential of mean force (PMF) as a function tilt angle ($\ensuremath{\tau}$) WALP19, transmembrane model peptide, dimyristoylphosphatidylcholine membrane. The PMF shows wide range thermally accessible angles (5\ifmmode^\circ\else\textdegree\fi{} to 22\ifmmode^\circ\else\textdegree\fi{}) with minimum at $\ensuremath{\tau}=12.5\ifmmode^\circ\else\textdegree\fi{}$. free energy decomposition reveals that...

10.1103/physrevlett.100.018103 article EN Physical Review Letters 2008-01-08

De novo protein structure prediction by dynamic fragment assembly and conformational space annealing

OPENALEX - Publications

Juyong Lee Jinhyuk Lee Takeshi Sasaki Masaki Sasai Chaok Seok and 1 more

Ab initio protein structure prediction is a challenging problem that requires both an accurate energetic representation of and efficient conformational sampling method for successful modeling. In this article, we present ab which combines recently suggested novel way fragment assembly, dynamic assembly (DFA) space annealing (CSA) algorithm. DFA, model structures are scored by continuous functions constructed based on short- long-range structural restraint information from library. Here, DFA...

10.1002/prot.23059 article EN Proteins Structure Function and Bioinformatics 2011-04-20

Contextualized Sparse Representations for Real-Time Open-Domain Question Answering

OPENALEX - Publications

Jinhyuk Lee Minjoon Seo Hannaneh Hajishirzi Jaewoo Kang

Open-domain question answering can be formulated as a phrase retrieval problem, in which we expect huge scalability and speed benefit but often suffer from low accuracy due to the limitation of existing representation models. In this paper, aim improve quality each embedding by augmenting it with contextualized sparse (Sparc). Unlike previous vectors that are term-frequency-based (e.g., tf-idf) or directly learned (only few thousand dimensions), leverage rectified self-attention indirectly...

10.18653/v1/2020.acl-main.85 article EN cc-by 2020-01-01

Pandemics are catalysts of scientific novelty: Evidence from COVID‐19

OPENALEX - Publications

Meijun Liu Yi Bu Chongyan Chen Jian Xu Daifeng Li and 12 more

Scientific novelty drives the efforts to invent new vaccines and solutions during pandemic. First-time collaboration international are two pivotal channels expand teams' search activities for a broader scope of resources required address global challenge, which might facilitate generation novel ideas. Our analysis 98,981 coronavirus papers suggests that scientific measured by BioBERT model is pretrained on 29 million PubMed articles, first-time increased after outbreak COVID-19, witnessed...

10.1002/asi.24612 article EN Journal of the Association for Information Science and Technology 2021-12-25

DeepNAP: Deep neural anomaly pre-detection in a semiconductor fab

OPENALEX - Publications

Chunggyeom Kim Jinhyuk Lee Raehyun Kim Youngbin Park Jaewoo Kang

10.1016/j.ins.2018.05.020 article EN Information Sciences 2018-05-07

Phrase Retrieval Learns Passage Retrieval, Too

OPENALEX - Publications

Jinhyuk Lee Alexander Wettig Danqi Chen

Dense retrieval methods have shown great promise over sparse in a range of NLP problems. Among them, dense phrase retrieval—the most fine-grained unit—is appealing because phrases can be directly used as the output for question answering and slot filling tasks. In this work, we follow intuition that retrieving naturally entails larger text blocks study whether serve basis coarse-level including passages documents. We first observe phrase-retrieval system, without any retraining, already...

10.18653/v1/2021.emnlp-main.297 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

Gecko: Versatile Text Embeddings Distilled from Large Language Models

OPENALEX - Publications

Jinhyuk Lee Zhuyun Dai Xiaoqi Ren Blair Chen Daniel Cer and 15 more

We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging key idea: distilling knowledge from large language models (LLMs) into retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the quality retrieving set of candidate passages for each query, relabeling positive hard negative same The effectiveness our approach is demonstrated compactness...

10.48550/arxiv.2403.20327 preprint EN arXiv (Cornell University) 2024-03-29

Surface defect detection using distributed features

OPENALEX - Publications

The Van Le Jordan Daniel Joshua Tae-Hwan Kim Jinhyuk Lee Seong Han Kim and 1 more

10.1016/j.engappai.2025.110655 article EN Engineering Applications of Artificial Intelligence 2025-04-08

Name Nationality Classification with Recurrent Neural Networks

OPENALEX - Publications

Jinhyuk Lee Hyunjae Kim Miyoung Ko Donghee Choi Jaehoon Choi and 1 more

Personal names tend to have many variations differing from country country. Though there exists a large amount of personal on the Web, nationality prediction solely based has not been fully studied due its difficulties in extracting subtle character level features. We propose recurrent neural network model which predicts nationalities each name using automatic feature extraction. Evaluation Olympic record data shows that our achieves greater accuracy than previous approaches tasks. also...

10.24963/ijcai.2017/289 article EN 2017-07-28

Role of Hydrogen Bonding and Helix−Lipid Interactions in Transmembrane Helix Association

OPENALEX - Publications

Jinhyuk Lee Wonpil Im

To explore the role of hydrogen bonding and helix−lipid interactions in transmembrane helix association, we have calculated potential mean force (PMF) as a function helix−helix distance between two pVNVV peptides, model peptide based on GCN4 leucine-zipper, dimyristoylphosphatidylcholine (DMPC) membrane. The name represents interfacial residues heptad repeat dimer. free energy decomposition reveals that total PMF consists competing contributions from interactions. direct, favorable arise...

10.1021/ja711239h article EN Journal of the American Chemical Society 2008-04-19

Conformational change of the methionine 20 loop of Escherichia coli dihydrofolate reductase modulates pKa of the bound dihydrofolate

OPENALEX - Publications

Ilja V. Khavrutskii Daniel J. Price Jinhyuk Lee Charles L. Brooks

We evaluate the pK(a) of dihydrofolate (H(2)F) at N(5) position in three ternary complexes with Escherichia coli reductase (ecDHFR), namely ecDHFR(NADP(+):H(2)F) closed form (1), and Michaelis ecDHFR(NADPH:H(2)F) (2) occluded (3) forms, by performing free energy perturbation molecular dynamics simulations (FEP/MD). Our suggest that complex is modulated Met20 loop fluctuations, providing largest shift substates a "tightly closed" conformation; "partially closed/open" substates, similar to...

10.1110/ps.062724307 article EN Protein Science 2007-05-02

Coming Soon ...