NFDI4DS | UHH-SEMS - Publication Details

Yaqing Wang

ORCID: 0000-0002-1548-0727

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5101752145

Research Areas

Topic Modeling
Natural Language Processing Techniques
Machine Learning in Healthcare
Domain Adaptation and Few-Shot Learning
Advanced Graph Neural Networks
Online Learning and Analytics
Generative Adversarial Networks and Image Synthesis
AI-based Problem Solving and Planning
Educational Technology and Assessment
Privacy-Preserving Technologies in Data
Adversarial Robustness in Machine Learning
Biomedical Text Mining and Ontologies
Speech and dialogue systems
Multimodal Machine Learning Applications
Spam and Phishing Detection
Semantic Web and Ontologies
Speech Recognition and Synthesis
Advanced Text Analysis Techniques
Stochastic Gradient Optimization Techniques
Misinformation and Its Impacts
Model Reduction and Neural Networks
Text and Document Classification Technologies
Internet Traffic Analysis and Secure E-voting
Artificial Intelligence in Healthcare
Traffic Prediction and Management Techniques

Purdue University West Lafayette
2021-2024

Google (United States)
2023-2024

Menlo School
2024

Meta Self-training for Few-shot Neural Sequence Labeling

OPENALEX - Publications

Yaqing Wang Subhabrata Mukherjee Haoda Chu Yuancheng Tu Ming Wu and 2 more

Neural sequence labeling is widely adopted for many Natural Language Processing (NLP) tasks, such as Named Entity Recognition (NER) and slot tagging dialog systems semantic parsing. Recent advances with large-scale pre-trained language models have shown remarkable success in these tasks when fine-tuned on large amounts of task-specific labeled data. However, obtaining training data not only costly, but also may be feasible sensitive user applications due to access privacy constraints. This...

10.1145/3447548.3467235 article EN 2021-08-13

FedTriNet: A Pseudo Labeling Method with Three Players for Federated Semi-supervised Learning

OPENALEX - Publications

Liwei Che Zewei Long Jiaqi Wang Yaqing Wang Houping Xiao and 1 more

Federated Learning has shown great potentials for the distributed data utilization and privacy protection. Most existing federated learning approaches focus on supervised setting, which means all stored in each client labels. However, real-world applications, are impossible to be fully labeled. Thus, how exploit unlabeled should a new challenge learning. Although few studies attempting overcome this challenge, they may suffer from information leakage or misleading usage problems. To tackle...

10.1109/bigdata52589.2021.9671374 article EN 2021 IEEE International Conference on Big Data (Big Data) 2021-12-15

MedRetriever

OPENALEX - Publications

Muchao Ye Suhan Cui Yaqing Wang Junyu Luo Cao Xiao and 1 more

The broad adoption of electronic health record (EHR) systems and the advances deep learning technology have motivated development risk prediction models, which mainly depend on expressiveness temporal modeling capacity neural networks (DNNs) to improve performance. Some further augment by using external knowledge, however, a great deal EHR information inevitably loses during knowledge mapping. In addition, made existing models usually lacks reliable interpretation, undermines their...

10.1145/3459637.3482273 article EN 2021-10-26

AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning

OPENALEX - Publications

Yaqing Wang Subhabrata Mukherjee Xiaodong Liu Jing Gao Ahmed Hassan Awadallah and 1 more

Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updating hundreds millions to billions parameters, and storing a copy the PLM weights every task resulting in increased cost storing, sharing serving models. To address this, parameter-efficient (PEFT) techniques were introduced where small trainable components are injected updated during fine-tuning. We propose AdaMix as general PEFT method that tunes mixture adaptation modules -- given underlying...

10.48550/arxiv.2205.12410 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models

OPENALEX - Publications

Yuan Zhong Xiaochen Wang Jiaqi Wang Xiaokun Zhang Yaqing Wang and 3 more

Synthesizing electronic health records (EHR) data has become a preferred strategy to address scarcity, improve quality, and model fairness in healthcare. However, existing approaches for EHR generation predominantly rely on state-of-the-art generative techniques like adversarial networks, variational autoencoders, language models. These methods typically replicate input visits, resulting inadequate modeling of temporal dependencies between visits overlooking the time information, crucial...

10.1145/3637528.3671836 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2024-08-24

Multimodal Emergent Fake News Detection via Meta Neural Process Networks

OPENALEX - Publications

Yaqing Wang Fenglong Ma Haoyu Wang Kishlay Jha Jing Gao

Fake news travels at unprecedented speeds, reaches global audiences and puts users communities great risk via social media platforms. Deep learning based models show good performance when trained on large amounts of labeled data events interest, whereas the tends to degrade other due domain shift. Therefore, significant challenges are posed for existing detection approaches detect fake emergent events, where large-scale datasets difficult obtain. Moreover, adding knowledge from newly...

10.1145/3447548.3467153 preprint EN 2021-08-12

Teach LLMs to Personalize -- An Approach inspired by Writing Education

OPENALEX - Publications

Cheng Li Mingyang Zhang Qiaozhu Mei Yaqing Wang Spurthi Amba Hombaiah and 2 more

Personalized text generation is an emerging research area that has attracted much attention in recent years. Most studies this direction focus on a particular domain by designing bespoke features or models. In work, we propose general approach for personalized using large language models (LLMs). Inspired the practice of writing education, develop multistage and multitask framework to teach LLMs generation. instruction, task from sources often decomposed into multiple steps involve finding,...

10.48550/arxiv.2308.07968 preprint EN other-oa arXiv (Cornell University) 2023-01-01

ConML: A Universal Meta-Learning Framework with Task-Level Contrastive Learning

OPENALEX - Publications

Shiguang Wu Yaqing Wang Yatao Bian Quanming Yao

Meta-learning enables learning systems to adapt quickly new tasks, similar humans. To emulate this human-like rapid and enhance alignment discrimination abilities, we propose ConML, a universal meta-learning framework that can be applied various algorithms without relying on specific model architectures nor target models. The core of ConML is task-level contrastive learning, which extends from the representation space in unsupervised meta-learning. By leveraging task identity as an...

10.48550/arxiv.2410.05975 preprint EN arXiv (Cornell University) 2024-10-08

GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning

OPENALEX - Publications

Jiale Fu Yaqing Wang Simeng Han Jiaming Fan Si Chen and 1 more

In-context learning (ICL) enables large language models (LLMs) to generalize new tasks by incorporating a few in-context examples (ICEs) directly in the input, without updating parameters. However, effectiveness of ICL heavily relies on selection ICEs, and conventional text-based embedding methods are often inadequate for that require multi-step reasoning, such as mathematical logical problem solving. This is due bias introduced shallow semantic similarities fail capture deeper reasoning...

10.48550/arxiv.2410.02203 preprint EN arXiv (Cornell University) 2024-10-03

Macular: A Multi-Task Adversarial Framework for Cross-Lingual Natural Language Understanding

OPENALEX - Publications

Haoyu Wang Yaqing Wang Feijie Wu Hongfei Xue Jing Gao

Cross-lingual natural language understanding~(NLU) aims to train NLU models on a source and apply the tasks in target languages, is fundamental task for many cross-language applications. Most of existing cross-lingual assume existence parallel corpora so that words sentences languages could be aligned. However, construction such expensive sometimes infeasible. Motivated by this challenge, recent works propose data augmentation or adversarial training methods reduce reliance external corpora....

10.1145/3580305.3599864 article EN cc-by Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2023-08-04

Macedon: Minimizing Representation Coding Rate Reduction for Cross-Lingual Natural Language Understanding

OPENALEX - Publications

Haoyu Wang Yaqing Wang Huaxiu Yao Jing Gao

Cross-lingual natural language understanding(NLU) is one of the fundamental tasks NLP. The goal to learn a model which can generalize well on both high-resource and low-resource data. Recent pre-trained multilingual models, e.g., BERT, XLM, have shown impressive performance cross-lingual NLU tasks. However, such promising results request use sufficient training data, difficult condition satisfy for language. When data limited in those low resource languages, accuracy existing models will...

10.18653/v1/2023.findings-emnlp.829 article EN cc-by 2023-01-01

Heterogeneous Information Enhanced Prerequisite Learning in Massive Open Online Courses

OPENALEX - Publications

Tianqi Wang Fenglong Ma Yaqing Wang Jing Gao

The knowledge concept prerequisites describing the dependencies are critical for fundamental tasks such as material recommendations and there a huge amount of concepts in Massive Open Online Courses (MOOCs). Thus it is necessary to develop automatic prerequisite relation annotation methods. Recently, few methods have shown their effectiveness discovering Moocs automatically. However, they suffer from two common issues, i.e., not thoroughly learnt, informative supervision sources ignored. To...

10.1109/icdm54844.2022.00155 article EN 2021 IEEE International Conference on Data Mining (ICDM) 2022-11-01

Learning from Language Description: Low-shot Named Entity Recognition via Decomposed Framework

OPENALEX - Publications

Yaqing Wang Haoda Chu Chao Zhang Jing Gao

In this work, we study the problem of named entity recognition (NER) in a low resource scenario, focusing on few-shot and zero-shot settings. Built upon large-scale pre-trained language models, propose novel NER framework, namely SpanNER, which learns from natural supervision enables identification never-seen classes without using in-domain labeled data. We perform extensive experiments 5 benchmark datasets evaluate proposed method learning, domain transfer learning The experimental results...

10.48550/arxiv.2109.05357 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Coming Soon ...