- Topic Modeling
- Natural Language Processing Techniques
- Biomedical Text Mining and Ontologies
- Multimodal Machine Learning Applications
- Advanced Text Analysis Techniques
- Advanced Graph Neural Networks
- Semantic Web and Ontologies
- Human Pose and Action Recognition
- Speech and dialogue systems
- Machine Learning in Healthcare
- Market Dynamics and Volatility
- Opinion Dynamics and Social Influence
- Explainable Artificial Intelligence (XAI)
- Genomics and Phylogenetic Studies
- Complex Network Analysis Techniques
- Domain Adaptation and Few-Shot Learning
- Electronic Health Records Systems
- Genomics and Rare Diseases
- Cancer Genomics and Diagnostics
- Sentiment Analysis and Opinion Mining
- Advanced Neural Network Applications
- Advanced Vision and Imaging
- Data Quality and Management
- Grey System Theory Applications
- Energy Load and Power Forecasting
University of Shanghai for Science and Technology
2022
Tsinghua University
2019-2021
University Town of Shenzhen
2020-2021
Tsinghua–Berkeley Shenzhen Institute
2021
Peng Cheng Laboratory
2021
Columbia University
2018-2020
Abstract Objective Cohort definition is a bottleneck for conducting clinical research and depends on subjective decisions by domain experts. Data-driven cohort appealing but requires substantial knowledge of terminologies data models. Criteria2Query natural language interface that facilitates human-computer collaboration execution using databases. Materials Methods uses hybrid information extraction pipeline combining machine learning rule-based methods to systematically parse eligibility...
Chinese relation extraction is conducted using neural networks with either character-based or word-based inputs, and most existing methods typically suffer from segmentation errors ambiguity of polysemy. To address the issues, we propose a multi-grained lattice framework (MG lattice) for to take advantage language information external linguistic knowledge. In this framework, (1) incorporate word-level into character sequence inputs so that can be avoided. (2) We also model multiple senses...
Abstract We present Doc2Hpo, an interactive web application that enables and efficient phenotype concept curation from clinical text with automated normalization using the Human Phenotype Ontology (HPO). Users can edit HPO concepts automatically extracted by Doc2Hpo in real time, export into gene prioritization tools. Our evaluation showed significantly reduced manual effort while achieving high accuracy curation. is freely available at https://impact2.dbmi.columbia.edu/doc2hpo/. The source...
Ning Ding, Ziran Li, Zhiyuan Liu, Haitao Zheng, Zibo Lin. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
We study the problem of infobox-to-text generation that aims to generate a textual description from key-value table. Representing input infobox as sequence, previous neural methods using end-to-end models without order-planning suffer problems incoherence and inadaptability disordered input. Recent planning-based only implement static guide generation, which may cause error propagation between planning generation. To address these issues, we propose Tree-like PLanning based Attention Network...
This paper studies the spatio-temporal video grounding task, which aims to localize a tube in an untrimmed based on given text description of event. Existing one-stage approaches suffer from insufficient space-time interaction two aspects: i) less precise prediction event temporal boundaries, and ii) inconsistency object for same across adjacent frames. To address these issues, we propose framework Comprehensive Space-Time entAnglement (CoSTA) densely entangle multi-modal features...
Coordination ellipsis is a linguistic phenomenon abound in medical text and challenging for concept normalization because of difficulty recognizing elliptical expressions referencing 2 or more entities accurately. To resolve this bottleneck, we aim to contribute generalizable method reconstruct concepts from coordinated variety biomedical corpora.We proposed graph-based representation model built pipeline (RECEEM). There are 4 modules: (1) identify all possible candidate conjunct pairs...
Referring expression comprehension aims to localize a natural language description in an image. Using location priors help reduce inaccuracies cross-modal alignments is the state of art for CNN-based methods tackling this problem. Recent Transformer-based models cast aside idea, making case steering away from hand-designed components. In work, we propose LUNA, which uses as continuing anchors guide box prediction Transformer decoder, and thus show that language-guided can be effectively...
Dialogue Act Recognition (DAR) is a challenging problem in Natural Language Understanding, which aims to attach (DA) labels each utterance conversation. However, previous studies cannot fully recognize the specific expressions given by users due informality and diversity of natural language expressions. To solve this problem, we propose Heterogeneous User History (HUH) graph convolution network, utilizes user’s historical answers grouped DA as additional clues label utterances. handle noise...
Generating a textual description from set of RDF triplets is challenging task in natural language generation. Recent neural methods have become the mainstream for this task, which often generate sentences scratch. However, due to huge gap between structured input and unstructured output, triples alone are insufficient decide an expressive specific description. In paper, we propose novel anchor-to-prototype framework bridge text. The model retrieves prototype descriptions training data...
Paraphrase generation aims to rewrite a text with different words while keeping the same meaning. Previous work performs task based solely on given dataset ignoring availability of external linguistic knowledge. However, it is intuitive that model can generate more expressive and diverse paraphrase help such To fill this gap, we propose Knowledge-Enhanced Network (KEPN), transformer-based framework leverage knowledge facilitate generation. (1) The integrates synonym information from into...
We study the commonsense inference task that aims to reason and generate causes effects of a given event. Existing neural methods focus more on understanding representing event itself, but pay little attention relations between different dimensions (e.g. or effects) event, making generated results logically inconsistent unreasonable. To alleviate this issue, we propose Chain Transformer, logic enhanced model combines both direct indirect inferences construct logical chain so as in consistent...
Multi-entity collaborative relationship extraction is an important but challenging task, which has been attracting a lot of interest and poses significant issues in front systems aimed at natural language understanding. Instead designing specific models for single tasks, this paper aims to propose general framework extract multiple relations among entities unstructured text by taking advantage existing models. Based on performing named entity recognition relation collaboratively, the...
The task of ACE Event Detection (ED) often encounters ambiguous and unseen trigger words. Most conventional ED systems exclusively consider the semantic or syntactic patterns as additional evidence to resolve problem triggers, but rarely taking advantages structured knowledge event itself. In this study, we propose Dynamic Word-Trigger-Argument Graph Neural Networks (DWTA-GNN), a novel framework that leverages structure facilitate two issues simultaneously. our approach, utilize words,...
Abstract Massive progress and fruitful results have been achieved since the implementation of Belt Road Initiative (B&R). However, risk exchange rate between countries along cannot be ignored. In this paper, we propose a B&R index to evaluate currency B&R, with research on multiscale features effective RMB using EMD algorithm. Then, an integrated forecasting approach. We adopt algorithm Grey Relational Analysis decompose reconstruct into three subsections different volatility...
Existing works for Dialogue Act Recognition (DAR) pay little attention to the imbalanced distribution of Acts (DAs) and exclusively train their models over very fine-grained DAs in one pass, which leads a limited performance recognizing low-frequent DAs. To address this issue, we propose hierarchical label structured network that explicitly introduces coarse-grained original A two-pass multi-head mechanism is devised integrate different levels DA information into utterance encoding process,...
Existing approaches in the vision-and-language pre-training (VLP) paradigm mainly deploy either fusion-based encoders or dual-encoders, failing to achieve both effectiveness and efficiency downstream multimodal tasks. In this paper, we build a flexible VLP model by incorporating cross-modal fusions into dual-encoder architecture, where introduced fusion modules can be easily decoupled from dual encoder so as switch fusion-free one. To better absorb features modules, design knowledge transfer...