- Natural Language Processing Techniques
- Topic Modeling
- Speech and dialogue systems
- Text Readability and Simplification
- Semantic Web and Ontologies
- Biomedical Text Mining and Ontologies
- Multimodal Machine Learning Applications
- Advanced Text Analysis Techniques
- Algorithms and Data Compression
- Video Analysis and Summarization
- Human Pose and Action Recognition
- Web Data Mining and Analysis
- Text and Document Classification Technologies
- Speech Recognition and Synthesis
- Language, Metaphor, and Cognition
- Domain Adaptation and Few-Shot Learning
- Stock Market Forecasting Methods
- Syntax, Semantics, Linguistic Variation
- semigroups and automata theory
- Machine Learning and Algorithms
- Rough Sets and Fuzzy Logic
- Software Engineering Research
- Advanced Database Systems and Queries
- Logic, programming, and type systems
- Mathematics, Computing, and Information Processing
The University of Tokyo
2009-2025
National Institute of Advanced Industrial Science and Technology
2016-2023
Tokyo Institute of Technology
2019-2023
Administration for Community Living
2023
IT University of Copenhagen
2023
American Jewish Committee
2023
Ochanomizu University
2018-2023
Imperial College London
2022
University of Wuppertal
2020
National Institute of Informatics
2010-2019
Stephan Oepen, Marco Kuhlmann, Yusuke Miyao, Daniel Zeman, Dan Flickinger, Jan Hajič, Angelina Ivanova, Yi Zhang. Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). 2014.
This paper defines a generative probabilistic model of parse trees, which we call PCFG-LA. is an extension PCFG in non-terminal symbols are augmented with latent variables. Fine-grained CFG rules automatically induced from parsed corpus by training PCFG-LA using EM-algorithm. Because exact parsing NP-hard, several approximations described and empirically compared. In experiments the Penn WSJ corpus, our trained gave performance 86.6% (F1, sentences ≤ 40 words), comparable to that...
Probabilistic modeling of lexicalized grammars is difficult because these exploit complicated data structures, such as typed feature structures. This prevents us from applying common methods probabilistic in which a complete structure divided into sub-structures under the assumption statistical independence among sub-structures. For example, part-of-speech tagging sentence decomposed each word, and CFG parsing split applications rules. These have relied on target problem, namely lattices or...
Abstract Motivation: While text mining technologies for biomedical research have gained popularity as a way to take advantage of the explosive growth information in form papers, selecting appropriate natural language processing (NLP) tools is still difficult researchers who are not familiar with recent advances NLP. This article provides comparative evaluation several state-of-the-art parsers, focusing on task extracting protein–protein interaction (PPI) from papers. We measure how each...
Stephan Oepen, Marco Kuhlmann, Yusuke Miyao, Daniel Zeman, Silvie Cinková, Dan Flickinger, Jan Hajič, Zdeňka Urešová. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). 2015.
Temporal relation classification is becoming an active research field. Lots of methods have been proposed, while most them focus on extracting features from external resources. Less attention has paid to a significant advance in closely related task: extraction. In this work, we borrow state-of-the-art method extraction by adopting bidirectional long short-term memory (Bi-LSTM) along dependency paths (DP). We make "common root" assumption extend DP representations cross-sentence links. the...
Because of the importance protein-protein interaction (PPI) extraction from text, many corpora have been proposed with slightly differing definitions proteins and PPI. Since no single corpus is large enough to saturate a machine learning system, it necessary learn multiple different corpora. In this paper, we propose solution challenge. We designed rich feature vector, applied support vector modified for weighting (SVM-CW) complete task PPI extraction. The made useful kernels, used express...
Background: Work on pharmacovigilance systems using texts from PubMed and Twitter typically target at different elements use annotation guidelines resulting in a scenario where there is no comparable set of documents both annotated the same manner. Objective: This study aimed to provide corpus that can be used drug reports these two sources information, allowing researchers area natural language processing (NLP) perform experiments better understand similarities differences between PubMed....
This paper reports the development of log-linear models for disambiguation in wide-coverage HPSG parsing. The estimation requires high computational cost, especially with grammars. Using techniques to reduce we trained using 20 sections Penn Tree-bank. A series experiments empirically evaluated techniques, and also examined performance on parsing real-world sentences.
This paper introduces a novel framework for the accurate retrieval of relational concepts from huge texts. Prior to retrieval, all sentences are annotated with predicate argument structures and ontological identifiers by applying deep parser term recognizer. During run time, user requests converted into queries region algebra on these annotations. Structural matching pre-computed semantic annotations establishes efficient concepts. was applied text system MEDLINE. Experiments biomedical...
Soichiro Murakami, Akihiko Watanabe, Akira Miyazawa, Keiichi Goshima, Toshihiko Yanase, Hiroya Takamura, Yusuke Miyao. Proceedings of the 55th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2017.
Existing dialogue models may encounter scenarios which are not well-represented in the training data, and as a result generate responses that unnatural, inappropriate, or unhelpful. We propose "Ask an Expert" framework model is trained with access to "expert" it can consult at each turn. Advice solicited via structured expert, optimized selectively utilize (or ignore) given context history. In this work expert takes form of LLM.We evaluate mental health support domain, where structure...
This paper presents techniques to apply semi-CRFs Named Entity Recognition tasks with a tractable computational cost. Our framework can handle an NER task that has long named entities and many labels which increase the To reduce cost, we propose two techniques: first is use of feature forests, enables us pack feature-equivalent states, second introduction filtering process significantly reduces number candidate states. allows rich set features extracted from chunk-based representation...
We demonstrate a simple and easy-to-use system to produce logical semantic representations of sentences.Our software operates by composing formulas bottom-up given CCG parse tree.It uses flexible templates specify patterns.Templates for English Japanese accompany our software, they are easy understand, use extend cover other linguistic phenomena or languages.We also provide scripts in textual entailment task, visualization tool display semantically augmented trees HTML.