NFDI4DS | UHH-SEMS - Publication Details

Lizhen Qu

ORCID: 0000-0002-7764-431X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5008486397

Research Areas

Topic Modeling
Natural Language Processing Techniques
Multimodal Machine Learning Applications
Speech and dialogue systems
Domain Adaptation and Few-Shot Learning
Adversarial Robustness in Machine Learning
Sentiment Analysis and Opinion Mining
Multi-Agent Systems and Negotiation
Advanced Graph Neural Networks
Privacy-Preserving Technologies in Data
Advanced Image and Video Retrieval Techniques
Advanced Text Analysis Techniques
Web Data Mining and Analysis
Artificial Intelligence in Law
Human Pose and Action Recognition
Text Readability and Simplification
Advanced Malware Detection Techniques
Speech Recognition and Synthesis
Software Engineering Research
Handwritten Text Recognition Techniques
Privacy, Security, and Data Protection
Semantic Web and Ontologies
Advanced Neural Network Applications
Music and Audio Processing
Expert finding and Q&A systems

Monash University
2019-2025

Australian Regenerative Medicine Institute
2023-2025

Google (United States)
2023

Peng Cheng Laboratory
2022

Harbin Institute of Technology
2022

The University of Melbourne
2022

Data61
2015-2020

Commonwealth Scientific and Industrial Research Organisation
2016-2019

The Dialogue
2019

Australian National University
2015-2017

Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach

OPENALEX - Publications

Giorgio Patrini Alessandro Rozza Aditya Krishna Menon Richard Nock Lizhen Qu

We present a theoretically grounded approach to train deep neural networks, including recurrent subject class-dependent label noise. propose two procedures for loss correction that are agnostic both application domain and network architecture. They simply amount at most matrix inversion multiplication, provided we know the probability of each class being corrupted into another. further show how one can estimate these probabilities, adapting recent technique noise estimation multi-class...

10.1109/cvpr.2017.240 preprint EN 2017-07-01

STransE: a novel embedding model of entities and relationships in knowledge bases

OPENALEX - Publications

Dat Quoc Nguyen Kairit Sirts Lizhen Qu Mark Johnson

Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu, Mark Johnson. Proceedings of the 2016 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2016.

10.18653/v1/n16-1054 preprint EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2016-01-01

Timely YAGO

OPENALEX - Publications

Yafang Wang Mingjie Zhu Lizhen Qu Marc Spaniol Gerhard Weikum

Recent progress in information extraction has shown how to automatically build large ontologies from high-quality sources like Wikipedia. But knowledge evolves over time; facts have associated validity intervals. Therefore, should include time as a first-class dimension. In this paper, we introduce Timely YAGO, which extends our previously built base YAGO with temporal aspects. This prototype system extracts Wikipedia infoboxes, categories, and lists articles, integrates these into the base....

10.1145/1739041.1739130 preprint EN 2010-03-16

Automatic Generation of Grounded Visual Questions

OPENALEX - Publications

Shijie Zhang Lizhen Qu Shaodi You Zhenglu Yang Jiawan Zhang

In this paper, we propose the first model to be able generate visually grounded questions with diverse types for a single image. Visual question generation is an emerging topic which aims ask in natural language based on visual input. To best of our knowledge, it lacks automatic methods meaningful various same circumvent problem, that automatically generates varying types. Our takes as input both images and captions generated by dense caption model, samples most probable types, sequel. The...

10.24963/ijcai.2017/592 article EN 2017-07-28

CD-VulD: Cross-Domain Vulnerability Discovery Based on Deep Domain Adaptation

OPENALEX - Publications

Shigang Liu Guanjun Lin Lizhen Qu Jun Zhang Olivier De Vel and 2 more

A major cause of security incidents such as cyber attacks is rooted in software vulnerabilities. These vulnerabilities should ideally be found and fixed before the code gets deployed. Machine learning-based approaches achieve state-of-the-art performance capturing methods are predominantly supervised. Their prediction models trained on a set ground truth data where training test assumed to drawn from same probability distribution. However, practice, often differs terms distribution because...

10.1109/tdsc.2020.2984505 article EN IEEE Transactions on Dependable and Secure Computing 2020-04-02

FedPETuning: When Federated Learning Meets the Parameter-Efficient Tuning Methods of Pre-trained Language Models

OPENALEX - Publications

Zhuo Zhang Yuanhang Yang Yong Dai Qifan Wang Yue Yu and 2 more

With increasing concerns about data privacy, there is an necessity of fine-tuning pre-trained language models (PLMs) for adapting to downstream tasks located in end-user devices or local clients without transmitting the central server. This urgent therefore calls research investigating federated learning (FL) PLMs. However, large PLMs bring curse prohibitive communication overhead and model adaptation costs FL system. To this end, we investigate parameter-efficient tuning (PETuning) develop...

10.18653/v1/2023.findings-acl.632 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2023-01-01

Harvesting facts from textual web sources by constrained label propagation

OPENALEX - Publications

Yafang Wang Bin Yang Lizhen Qu Marc Spaniol Gerhard Weikum

There have been major advances on automatically constructing large knowledge bases by extracting relational facts from Web and text sources. However, the world is dynamic: periodic events like sports competitions need to be interpreted with their respective timepoints, such as coaching a team, holding political or business positions, even marriages do not hold forever should augmented timespans. This paper addresses problem of harvesting temporal extended time-awareness. We employ...

10.1145/2063576.2063698 preprint EN 2011-10-24

Neighborhood Mixture Model for Knowledge Base Completion

OPENALEX - Publications

Dat Quoc Nguyen Kairit Sirts Lizhen Qu Mark Johnson

Knowledge bases are useful resources for many natural language processing tasks, however, they far from complete.In this paper, we define a novel entity representation as mixture of its neighborhood in the knowledge base and apply technique on TransE-a well-known embedding model completion.Experimental results show that information significantly helps to improve TransE, leading better performance than obtained by other state-of-the-art models three benchmark datasets triple classification,...

10.18653/v1/k16-1005 preprint EN cc-by 2016-01-01

Demographic Inference on Twitter using Recursive Neural Networks

OPENALEX - Publications

Sunghwan Mac Kim Qiongkai Xu Lizhen Qu Stephen Wan Cécile Paris

In social media, demographic inference is a critical task in order to gain better understanding of cohort and facilitate interacting with one’s audience. Most previous work has made independence assumptions over topological, textual label information on networks. this work, we employ recursive neural networks break down these obtain about characteristics Twitter. We show that our model performs than existing models including the state-of-the-art.

10.18653/v1/p17-2075 article EN cc-by 2017-01-01

Easy-to-Deploy API Extraction by Multi-Level Feature Embedding and Transfer Learning

OPENALEX - Publications

Suyu Ma Zhenchang Xing Chunyang Chen Cheng Chen Lizhen Qu and 1 more

Application Programming Interfaces (APIs) have been widely discussed on social-technical platforms (e.g., Stack Overflow). Extracting API mentions from such informal software texts is the prerequisite for API-centric search and summarization of programming knowledge. Machine learning based extraction has demonstrated superior performance than rule-based methods in that lack consistent writing forms annotations. However, machine a significant overhead preparing training data effective...

10.1109/tse.2019.2946830 article EN IEEE Transactions on Software Engineering 2019-10-11

Learning Object-Language Alignments for Open-Vocabulary Object Detection

OPENALEX - Publications

Chuang Lin Peize Sun Yi Jiang Ping Luo Lizhen Qu and 3 more

Existing object detection methods are bounded in a fixed-set vocabulary by costly labeled data. When dealing with novel categories, the model has to be retrained more bounding box annotations. Natural language supervision is an attractive alternative for its annotation-free attributes and broader concepts. However, learning open-vocabulary from challenging since image-text pairs do not contain fine-grained object-language alignments. Previous solutions rely on either expensive grounding...

10.48550/arxiv.2211.14843 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Adapting Large Language Models for Document-Level Machine Translation

OPENALEX - Publications

Minghao Wu Thuy-Trang Vu Lizhen Qu George Foster Gholamreza Haffari

Large language models (LLMs) have made significant strides in various natural processing (NLP) tasks. Recent research shows that the moderately-sized LLMs often outperform their larger counterparts after task-specific fine-tuning. In this work, we delve into process of adapting to specialize document-level machine translation (DocMT) for a specific pair. Firstly, explore how prompt strategies affect downstream performance. Then, conduct extensive experiments with two fine-tuning methods,...

10.48550/arxiv.2401.06468 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Named Entity Recognition for Novel Types by Transfer Learning

OPENALEX - Publications

Lizhen Qu Gabriela Ferraro Liyuan Zhou Weiwei Hou Timothy Baldwin

In named entity recognition, we often don't have a large in-domain training corpus or knowledge base with adequate coverage to train model directly.In this paper, propose method where, given data in related domain similar (but not identical) (NE) types and small amount of data, use transfer learning learn domain-specific NE model.That is, the novelty task setup is that assume just mismatch, but also label mismatch.

10.18653/v1/d16-1087 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2016-01-01

D-PAGE: Diverse Paraphrase Generation

OPENALEX - Publications

Qiongkai Xu Juyan Zhang Lizhen Qu Lexing Xie Richard Nock

In this paper, we investigate the diversity aspect of paraphrase generation. Prior deep learning models employ either decoding methods or add random input noise for varying outputs. We propose a simple method Diverse Paraphrase Generation (D-PAGE), which extends neural machine translation (NMT) to support generation diverse paraphrases with implicit rewriting patterns. Our experimental results on two real-world benchmark datasets demonstrate that our model generates at least one order...

10.48550/arxiv.1808.04364 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Adhering, Steering, and Queering: Treatment of Gender in Natural Language Generation

OPENALEX - Publications

Yolande Strengers Lizhen Qu Qiongkai Xu Jarrod Knibbe

Natural Language Generation (NLG) supports the creation of personalized, contextualized, and targeted content. However, algorithms underpinning NLG have come under scrutiny for reinforcing gender, racial, other problematic biases. Recent research in seeks to remove these biases through principles fairness privacy. Drawing on gender queer theories from sociology Science Technology studies, we consider how can contribute towards advancement equity society. We propose a conceptual framework...

10.1145/3313831.3376315 article EN 2020-04-21

Using artificial intelligence to semi-automate trustworthiness assessment of randomized controlled trials: A case study

OPENALEX - Publications

Ling Shan Au Lizhen Qu Jeremy Nielsen Zongyuan Ge Lyle C. Gurrin and 2 more

Randomised controlled trials (RCTs) are the cornerstone of evidence-based medicine. Unfortunately, not all RCTs based on real data. This serious breach research integrity compromises reliability systematic reviews and meta-analyses, leading to misinformed clinical guidelines posing a risk both individual public health. While methods detect problematic have been proposed, they time-consuming labour-intensive. The use artificial intelligence large language models (LLM) has potential accelerate...

10.1016/j.jclinepi.2025.111672 article EN cc-by Journal of Clinical Epidemiology 2025-01-01

Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning

OPENALEX - Publications

Manh Luong Khai Nguyen Dinh Phung Gholamreza Haffari Lizhen Qu

Teacher-forcing training for audio captioning usually leads to exposure bias due and inference mismatch. Prior works propose the contrastive method deal with caption degeneration. However, ignores temporal information when measuring similarity across acoustic linguistic modalities, leading inferior performance. In this work, we develop temporal-similarity score by introducing unbiased sliced Wasserstein RBF (USW-RBF) kernel equipped rotary positional embedding account modalities. contrast...

10.48550/arxiv.2502.05435 preprint EN arXiv (Cornell University) 2025-02-07

RIDE: Enhancing Large Language Model Alignment through Restyled In-Context Learning Demonstration Exemplars

OPENALEX - Publications

Yuncheng Hua Lizhen Qu Zhuang Li Hao Xue Flora D. Salim and 1 more

Alignment tuning is crucial for ensuring large language models (LLMs) behave ethically and helpfully. Current alignment approaches require high-quality annotations significant training resources. This paper proposes a low-cost, tuning-free method using in-context learning (ICL) to enhance LLM alignment. Through an analysis of ICL demos, we identified style as key factor influencing capabilities explicitly restyled exemplars based on this stylistic framework. Additionally, combined the demos...

10.48550/arxiv.2502.11681 preprint EN arXiv (Cornell University) 2025-02-17

Deep Domain Adaptation for Vulnerable Code Function Identification

OPENALEX - Publications

Van Nguyen Trung Le T. Kim Le Khanh Nguyen Olivier De Vel and 3 more

Due to the ubiquity of computer software, software vulnerability detection (SVD) has become crucial in industry and field security. Two significant issues SVD arise when using machine learning, namely: i) how learn automatic features that can help improve predictive performance ii) overcome scarcity labeled vulnerabilities projects require laborious labeling code by security experts. In this paper, we address these two concerns proposing a novel architecture which leverages deep domain...

10.1109/ijcnn.2019.8851923 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2019-07-01

SocialDial: A Benchmark for Socially-Aware Dialogue Systems

OPENALEX - Publications

Haolan Zhan Zhuang Li Yufei Wang Linhao Luo Tao Feng and 8 more

Content Warning: this paper may contain content that is offensive or upsetting.

10.1145/3539618.3591877 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023-07-18

Generative Region-Language Pretraining for Open-Ended Object Detection

OPENALEX - Publications

Chuang Lin Yi Jiang Lizhen Qu Zehuan Yuan Jianfei Cai

10.1109/cvpr52733.2024.01324 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

From Continuous Pre-Training to Alignment: A Comprehensive Toolkit for Large Language Models in Federated Learning

OPENALEX - Publications

Zhuo Zhang Yukun Zhang Guanzhong Chen Lizhen Qu Xun Zhou and 2 more

The rapid success of Large Language Models (LLMs) has unlocked vast potential for AI applications in privacy-sensitive domains. However, the traditional centralized training LLMs poses significant challenges due to privacy concerns regarding collecting sensitive data from diverse sources. This paper offers a promising and privacy-enhancing solution LLMs: collaboratively via Federated Learning (FL) across multiple clients, eliminating need raw transmission. To this end, we present F4LLM, new...

10.2139/ssrn.5087720 preprint EN 2025-01-01

Coming Soon ...