Huan Sun

ORCID: 0000-0001-6436-4813
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Multimodal Machine Learning Applications
  • Data Quality and Management
  • Expert finding and Q&A systems
  • Advanced Graph Neural Networks
  • Text and Document Classification Technologies
  • Semantic Web and Ontologies
  • Biomedical Text Mining and Ontologies
  • Software Engineering Research
  • Machine Learning in Healthcare
  • Adversarial Robustness in Machine Learning
  • Sentiment Analysis and Opinion Mining
  • Advanced Text Analysis Techniques
  • Mobile Crowdsensing and Crowdsourcing
  • Speech and dialogue systems
  • Explainable Artificial Intelligence (XAI)
  • Ferroelectric and Negative Capacitance Devices
  • Web Data Mining and Analysis
  • Privacy-Preserving Technologies in Data
  • Domain Adaptation and Few-Shot Learning
  • Recommender Systems and Techniques
  • Graph Theory and Algorithms
  • Bioinformatics and Genomic Networks
  • ECG Monitoring and Analysis

The Ohio State University
2016-2023

Mongolia International University
2023

RIKEN Center for Advanced Intelligence Project
2023

Beijing Jiaotong University
2019-2022

Jiangxi Normal University
2021

Google (United States)
2021

Xi'an Polytechnic University
2019

University of California, Santa Barbara
2013-2018

Harbin Engineering University
2003-2011

Harbin University
2003-2011

Graph embedding learning that aims to automatically learn low-dimensional node representations, has drawn increasing attention in recent years. To date, most graph methods are evaluated on social and information networks not comprehensively studied biomedical under systematic experiments analyses. On the other hand, for a variety of network analysis tasks, traditional techniques such as matrix factorization (which can be seen type methods) have shown promising results, hence there is need...

10.1093/bioinformatics/btz718 article EN cc-by-nc Bioinformatics 2019-09-27

Millions of users share their opinions on Twitter, making it a valuable platform for tracking and analyzing public sentiment. Such analysis can provide critical information decision in various domains. Therefore has attracted attention both academia industry. Previous research mainly focused modeling In this work, we move one step further to interpret sentiment variations. We observed that emerging topics (named foreground topics) within the variation periods are highly related genuine...

10.1109/tkde.2013.116 article EN IEEE Transactions on Knowledge and Data Engineering 2013-07-16

Tables are pervasive on the Web. Informative web tables range across a large variety of topics, which can naturally serve as significant resource to satisfy user information needs. Driven by such observations, in this paper, we investigate an important yet largely under-addressed problem: Given millions tables, how precisely retrieve table cells answer question. This work proposes novel cell search framework attack problem. We first formulate concept relational chain connects two and...

10.1145/2872427.2883080 article EN 2016-04-11

Most recent question answering (QA) systems query large-scale knowledge bases (KBs) to answer a question, after parsing and transforming natural language questions KBs-executable forms (e.g., logical forms). As well-known fact, KBs are far from complete, so that information required may not always exist in KBs. In this paper, we develop new QA system mines answers directly the Web, meanwhile employs as significant auxiliary further boost performance. Specifically, best of our knowledge, make...

10.1145/2736277.2741651 article EN 2015-05-18

We present a semi-automated framework for constructing factoid question answering (QA) datasets, where an array of characteristics are formalized, including structure complexity, function, commonness, answer cardinality, and paraphrasing.Instead collecting questions manually characterizing them, we employ reverse procedure, first generating kind graph-structured logical forms from knowledge base, then converting them into questions.Our work is the to generate with explicitly specified QA...

10.18653/v1/d16-1054 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2016-01-01

Learning to capture text-table alignment is essential for tasks like text-to-SQL. A model needs correctly recognize natural language references columns and values ground them in the given database schema. In this paper, we present a novel weakly supervised Structure-Grounded pretraining framework (StruG) text-to-SQL that can effectively learn based on parallel corpus. We identify set of prediction tasks: column grounding, value grounding column-value mapping, leverage pretrain encoder....

10.18653/v1/2021.naacl-main.105 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021-01-01

Relational tables on the Web store a vast amount of knowledge. Owing to wealth such tables, there has been tremendous progress variety tasks in area table understanding. However, existing work generally relies heavily-engineered task-specific features and model architectures. In this paper, we present TURL, novel framework that introduces pre-training/fine-tuning paradigm relational tables. During pre-training, our learns deep contextualized representations self-supervised manner. Its...

10.1145/3542700.3542709 article EN ACM SIGMOD Record 2022-05-31

Online reviews have been popularly adopted in many applications. Since they can either promote or harm the reputation of a product service, buying and selling fake becomes profitable business big threat. In this paper, we introduce very simple, but powerful review spamming technique that could fail existing feature-based detection algorithms easily. It uses one truthful as template, replaces its sentences with those from other repository. Fake generated by mechanism are extremely hard to...

10.1145/2487575.2487688 article EN 2013-08-11

Querying complex graph databases such as knowledge graphs is a challenging task for non-professional users. Due to their schemas and variational information descriptions, it becomes very hard users formulate query that can be properly processed by the existing systems. We argue user-friendly engine, must support various kinds of transformations synonym, abbreviation, ontology. Furthermore, derived results ranked in principled manner. In this paper, we introduce novel framework enabling...

10.14778/2732286.2732293 article EN Proceedings of the VLDB Endowment 2014-03-01

Relational tables on the Web store a vast amount of knowledge. Owing to wealth such tables, there has been tremendous progress variety tasks in area table understanding. However, existing work generally relies heavily-engineered task-specific features and model architectures. In this paper, we present TURL, novel framework that introduces pre-training/fine-tuning paradigm relational tables. During pre-training, our learns deep contextualized representations an unsupervised manner. Its...

10.14778/3430915.3430921 article EN Proceedings of the VLDB Endowment 2020-11-01

Ziyu Yao, Yu Su, Huan Sun, Wen-tau Yih. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.

10.18653/v1/d19-1547 article EN cc-by 2019-01-01

Large language models (LLMs) such as ChatGPT and GPT-4 have shown impressive performance in complex reasoning tasks. However, it is difficult to know whether the are based on deep understandings of truth logic, or leveraging their memorized patterns a relatively superficial way. In this work, we explore testing LLMs' by engaging with them debate-like conversation, where given question, LLM user need discuss make correct decision starting from opposing arguments. Upon mitigating Clever Hans...

10.18653/v1/2023.findings-emnlp.795 article EN cc-by 2023-01-01

Text-guided image editing is widely needed in daily life, ranging from personal use to professional applications such as Photoshop. However, existing methods are either zero-shot or trained on an automatically synthesized dataset, which contains a high volume of noise. Thus, they still require lots manual tuning produce desirable outcomes practice. To address this issue, we introduce MagicBrush (https://osu-nlp-group.github.io/MagicBrush/), the first large-scale, manually annotated dataset...

10.48550/arxiv.2306.10012 preprint EN other-oa arXiv (Cornell University) 2023-01-01

To accelerate software development, much research has been performed to help people understand and reuse the huge amount of available code resources. Two important tasks have widely studied: retrieval, which aims retrieve snippets relevant a given natural language query from base, annotation, where goal is annotate snippet with description. Despite their advancement in recent years, two are mostly explored separately. In this work, we investigate novel perspective Code annotation for...

10.1145/3308558.3313632 preprint EN 2019-05-13

Relational tables on the Web store a vast amount of knowledge. Owing to wealth such tables, there has been tremendous progress variety tasks in area table understanding. However, existing work generally relies heavily-engineered task-specific features and model architectures. In this paper, we present TURL, novel framework that introduces pre-training/fine-tuning paradigm relational tables. During pre-training, our learns deep contextualized representations an unsupervised manner. Its...

10.48550/arxiv.2006.14806 preprint EN cc-by-nc-nd arXiv (Cornell University) 2020-01-01

Existing text sentiment analysis methods mostly rely on a large number of language knowledge and resources. This paper proposes the Multi-channel convolution bidirectional GRU multi-head attention capsule (AT-MC-BiGRU-Capsule), which uses vector neurons to replace scalar model emotions, capsules characterize emotions. In addition, traditional cannot extract multi-level features sequence well. Multi-head can encode dependencies between words, capture words in text, using Convolutional Neural...

10.1109/access.2021.3073988 article EN cc-by IEEE Access 2021-01-01

A recent focus of large language model (LLM) development, as exemplified by generative search engines, is to incorporate external references generate and support its claims. However, evaluating the attribution, i.e., verifying whether generated statement fully supported cited reference, remains an open problem. Although human evaluation common practice, it costly time-consuming. In this paper, we investigate automatic attribution given LLMs. We begin defining different types errors, then...

10.18653/v1/2023.findings-emnlp.307 article EN cc-by 2023-01-01

Stack Overflow (SO) has been a great source of natural language questions and their code solutions (i.e., question-code pairs), which are critical for many tasks including retrieval annotation. In most existing research, pairs were collected heuristically tend to have low quality. this paper, we investigate new problem systematically mining from (in contrast collecting them). It is formulated as predicting whether or not snippet standalone solution question. We propose novel Bi-View...

10.1145/3178876.3186081 preprint EN 2018-01-01

This paper investigates a new task named Conversational Question Generation (CQG) which is to generate question based on passage and conversation history (i.e., previous turns of question-answer pairs). CQG crucial for developing intelligent agents that can drive question-answering style conversations or test user understanding given passage. Towards end, we propose approach Reinforced Dynamic Reasoning network, the general encoder-decoder framework but incorporates reasoning procedure in...

10.18653/v1/p19-1203 preprint EN cc-by 2019-01-01

Machine reading comprehension has made great progress in recent years owing to large-scale annotated datasets. In the clinical domain, however, creating such datasets is quite difficult due domain expertise required for annotation. Recently, Pampari et al. (EMNLP’18) tackled this issue by using expert-annotated question templates and existing i2b2 annotations create emrQA, first dataset answering (QA) based on notes. paper, we provide an in-depth analysis of (CliniRC) task. From our...

10.18653/v1/2020.acl-main.410 article EN cc-by 2020-01-01

Differentially private stochastic gradient descent (DP-SGD) adds noise to gradients in back-propagation, safeguarding training data from privacy leakage, particularly membership inference. It fails cover (inference-time) threats like embedding inversion and sensitive attribute is also costly storage computation when used fine-tune large pre-trained language models (LMs).

10.1145/3576915.3616592 article EN Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security 2023-11-15

The big data era is witnessing a prevalent shift of from homogeneous to heterogeneous, isolated linked. Exemplar outcomes this are wide range graph such as information, social, and knowledge graphs. unique characteristics challenging traditional search techniques like SQL keyword search. Graph query emerging promising complementary form. In paper, we study how improve by relevance feedback. Specifically, focus on query, formulate the feedback (GRF) problem. We propose general GRF framework...

10.1145/2783258.2783320 article EN 2015-08-07

Named Entity Disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries a reference knowledge base (e.g. Wikipedia). Such disambiguation can help add semantics plain distinguish homonymous entities. Previous research has tackled this problem by making use two types context-aware features derived from base, namely, context similarity semantic relatedness. Both heavily rely on cross-document hyperlinks within base:...

10.1145/2872427.2883068 article EN 2016-04-11

Yu Su, Honglei Liu, Semih Yavuz, Izzeddin Gür, Huan Sun, Xifeng Yan. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.

10.18653/v1/n18-1075 article EN cc-by 2018-01-01

Given a text description, most existing semantic parsers synthesize program in one shot. However, it is quite challenging to produce correct solely based on the which reality often ambiguous or incomplete. In this paper, we investigate interactive parsing, where agent can ask user clarification questions resolve ambiguities via multi-turn dialogue, an important type of programs called “If-Then recipes.” We develop hierarchical reinforcement learning (HRL) that significantly improves parsing...

10.1609/aaai.v33i01.33012547 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17
Coming Soon ...