Lifu Huang

ORCID: 0000-0002-2743-7718
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Advanced Text Analysis Techniques
  • Multimodal Machine Learning Applications
  • Advanced Graph Neural Networks
  • Domain Adaptation and Few-Shot Learning
  • Web Data Mining and Analysis
  • Biomedical Text Mining and Ontologies
  • Explainable Artificial Intelligence (XAI)
  • Text and Document Classification Technologies
  • Semantic Web and Ontologies
  • Privacy-Preserving Technologies in Data
  • Software Engineering Research
  • Data Quality and Management
  • Speech Recognition and Synthesis
  • Anomaly Detection Techniques and Applications
  • Advanced Image and Video Retrieval Techniques
  • Digital Mental Health Interventions
  • Speech and dialogue systems
  • Video Analysis and Summarization
  • Text Readability and Simplification
  • Algorithms and Data Compression
  • Machine Learning and ELM
  • Subtitles and Audiovisual Media
  • Mobile Crowdsensing and Crowdsourcing

Virginia Tech
2021-2024

National University
2023

George Mason University
2023

Amazon (United States)
2022-2023

RIKEN Center for Advanced Intelligence Project
2023

Mongolia International University
2023

New York University
2022

China XD Group (China)
2022

Shanghai Jiao Tong University
2021

Rensselaer Polytechnic Institute
2016-2020

Lifu Huang, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.

10.18653/v1/d19-1243 article EN cc-by 2019-01-01

Most previous supervised event extraction methods have relied on features derived from manual annotations, and thus cannot be applied to new types without extra annotation effort. We take a fresh look at model it as generic grounding problem: mapping each mention specific type in target ontology. design transferable architecture of structural compositional neural networks jointly represent map mentions into shared semantic space. Based this framework, we can select, for mention, the which is...

10.18653/v1/p18-1201 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2018-01-01

Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring trustworthiness emerges as an important topic. This paper introduces TrustLLM, a comprehensive study LLMs, including principles different dimensions trustworthiness, established benchmark, evaluation, and analysis mainstream discussion...

10.48550/arxiv.2401.05561 preprint EN cc-by-nc-sa arXiv (Cornell University) 2024-01-01

Distant supervision has been widely used in current systems of fine-grained entity typing to automatically assign categories (entity types) mentions.However, the types so obtained from knowledge bases are often incorrect for mention's local context.This paper proposes a novel embedding method separately model "clean" and "noisy" mentions, incorporates given type hierarchy induce loss functions.We formulate joint optimization problem learn embeddings mentions typepaths, develop an iterative...

10.18653/v1/d16-1144 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2016-01-01

Event detection remains a challenge due to the difficulty at encoding word semantics in various contexts.Previous approaches heavily depend on languagespecific knowledge and pre-existing natural language processing (NLP) tools.However, compared English, not all languages have such resources tools available.A more promising approach is automatically learn effective features from data, without relying resources.In this paper, we develop hybrid neural network capture both sequence chunk...

10.18653/v1/p16-2011 article EN cc-by 2016-01-01

10.18653/v1/p16-1025 article EN Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2016-01-01

Bhavana Dalvi, Lifu Huang, Niket Tandon, Wen-tau Yih, Peter Clark. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.

10.18653/v1/n18-1144 article EN cc-by 2018-01-01

Integrating text and knowledge into a unified semantic space has attracted significant research interests recently. However, the ambiguity in common remains challenge, namely that same mention phrase usually refers to various entities. In this paper, deal with of entity mentions, we propose novel Multi-Prototype Mention Embedding model, which learns multiple sense embeddings for each by jointly modeling words from textual contexts entities derived base. addition, further design an efficient...

10.18653/v1/p17-1149 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017-01-01

Understanding narratives requires reading between the lines, which in turn, interpreting likely causes and effects of events, even when they are not mentioned explicitly. In this paper, we introduce Cosmos QA, a large-scale dataset 35,600 problems that require commonsense-based comprehension, formulated as multiple-choice questions. stark contrast to most existing comprehension datasets where questions focus on factual literal understanding context paragraph, our focuses lines over diverse...

10.48550/arxiv.1909.00277 preprint EN cc-by arXiv (Cornell University) 2019-01-01

Instruction tuning, a new learning paradigm that fine-tunes pre-trained language models on tasks specified through instructions, has shown promising zero-shot performance various natural processing tasks. However, it yet to be explored for vision and multimodal In this work, we introduce MultiInstruct, the first instruction tuning benchmark dataset consists of 62 diverse in unified seq-to-seq format covering 10 broad categories. The are derived from 21 existing open-source datasets each task...

10.18653/v1/2023.acl-long.641 article EN cc-by 2023-01-01

Current image captioning approaches generate descriptions which lack specific information, such as named entities that are involved in the images. In this paper we propose a new task aims to informative captions, given images and hashtags input. We simple but effective approach tackle problem. first train convolutional neural networks - long short term memory (CNN-LSTM) model template caption based on input image. Then use knowledge graph collective inference algorithm fill with retrieved...

10.18653/v1/d18-1435 preprint EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

Event extraction is typically modeled as a multi-class classification problem where event types and argument roles are treated atomic symbols. These approaches usually limited to set of pre-defined types. We propose novel framework that uses natural language queries extract candidate triggers arguments from the input text. With rich semantics in queries, our benefits attention mechanisms better capture semantic correlation between or Furthermore, query-and-extract formulation allows approach...

10.18653/v1/2022.findings-acl.16 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2022-01-01

We propose end-to-end multimodal fact-checking and explanation generation, where the input is a claim large collection of web sources, including articles, images, videos, tweets, goal to assess truthfulness by retrieving relevant evidence predicting label (e.g., support, refute or not enough information), generate statement summarize explain reasoning ruling process. To support this research, we construct MOCHEG, large-scale dataset consisting 15,601 claims each annotated with statement,...

10.1145/3539618.3591879 article EN cc-by Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023-07-18

Diya Li, Lifu Huang, Heng Ji, Jiawei Han. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

10.18653/v1/n19-1145 article EN 2019-01-01

In mobile crowdsourcing, the accuracy of collected data is usually hard to ensure. Researchers have proposed techniques identify truth from noisy by inferring and utilizing reliability users, allocate tasks users with higher reliability. However, they neglect fact that a user may only expertise on some problems (in domains), but not others, hence causing two problems: low estimation in analysis ineffective task allocation. To address these problems, we propose Expertise-aware Truth Analysis...

10.1109/tmc.2019.2955688 article EN IEEE Transactions on Mobile Computing 2019-11-26

Most previous event extraction studies assume a set of target types and corresponding annotations are given, which could be very expensive. In this paper, we work on new task semi-supervised type induction, aiming to automatically discover unseen from given corpus by leveraging available for few seen types. We design Semi-Supervised Vector Quantized Variational Autoencoder framework learn discrete latent representation each optimize them using annotations. A variational autoencoder is...

10.18653/v1/2020.emnlp-main.53 article EN cc-by 2020-01-01

We compare various forms of prompts to represent event types and develop a unified framework incorporate the type specific for supervised, few-shot, zero-shot detection. The experimental results demonstrate that well-defined comprehensive prompt can significantly improve detection performance, especially when annotated data is scarce (few-shot detection) or not available (zero-shot detection). By leveraging semantics types, our shows up 22.2% F-score gain over previous state-of-the-art baselines.

10.18653/v1/2023.acl-short.111 article EN cc-by 2023-01-01

In this paper, we focus on improving Event Extraction (EE) by incorporating visual knowledge with words and phrases from text documents. We first discover patterns large-scale text-image pairs in a weakly-supervised manner then propose multimodal event extraction algorithm where the extractor is jointly trained textual features patterns. Extensive experimental results benchmark data sets demonstrate that proposed EE method can achieve significantly better performance extraction: absolute...

10.1145/3123266.3123294 article EN Proceedings of the 30th ACM International Conference on Multimedia 2017-10-19

With the rapid growth of online information services, a sheer volume news data becomes available. To help people quickly digest explosive information, we define new problem - schema-based event profiling events reported in open-domain corpora, with set slots and slot-value pairs for each event, where forms schema an type. Such not only provides readers concise views events, but also facilitates various applications such as retrieval, knowledge graph construction question answering. It is...

10.1145/3269206.3271674 article EN 2018-10-17

Despite recent progress of pre-trained language models on generating fluent text, existing methods still suffer from incoherence problems in long-form text generation tasks that require proper content control and planning to form a coherent high-level logical flow. In this work, we propose PLANET, novel framework leveraging autoregressive self-attention mechanism conduct surface realization dynamically. To guide the output sentences, our enriches Transformer decoder with latent...

10.18653/v1/2022.acl-long.163 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Extracting temporal relations (e.g., before, after, and simultaneous) among events is crucial to natural language understanding. One of the key challenges this problem that when interest are far away in text, context in-between often becomes complicated, making it challenging resolve relationship between them. This paper thus proposes a new Syntax-guided Graph Transformer network (SGT) mitigate issue, by (1) explicitly exploiting connection two based on their dependency parsing trees, (2)...

10.18653/v1/2022.findings-naacl.29 article EN cc-by Findings of the Association for Computational Linguistics: NAACL 2022 2022-01-01

Federated learning (FL) can be essential in knowledge representation, reasoning, and data mining applications over multi-source graphs (KGs). A recent study FedE first proposes an FL framework that shares entity embeddings of KGs across all clients. However, embedding sharing from would incur a severe privacy leakage. Specifically, the known used to infer whether specific relation between two entities exists private client. In this paper, we introduce novel attack method aims recover...

10.18653/v1/2022.findings-emnlp.43 article EN cc-by 2022-01-01

Qifan Wang, Yuning Mao, Jingang Hanchao Yu, Shaoliang Nie, Sinong Fuli Feng, Lifu Huang, Xiaojun Quan, Zenglin Xu, Dongfang Liu. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.

10.18653/v1/2023.emnlp-main.567 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01
Coming Soon ...