NFDI4DS | UHH-SEMS - Publication Details

Xiaofei Ma

ORCID: 0009-0000-3163-0310

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5104143781

Research Areas

Topic Modeling
Natural Language Processing Techniques
Multimodal Machine Learning Applications
Explainable Artificial Intelligence (XAI)
Domain Adaptation and Few-Shot Learning
Software Engineering Research
Complex Systems and Time Series Analysis
Machine Learning and Data Classification
Chaos control and synchronization
Neural dynamics and brain function
Fractal and DNA sequence analysis
Software Testing and Debugging Techniques
Speech and dialogue systems
EEG and Brain-Computer Interfaces
Medical Image Segmentation Techniques
Advanced Neural Network Applications
Heart Rate Variability and Autonomic Control
Brain Tumor Detection and Classification
E-commerce and Technology Innovations
Text and Document Classification Technologies
Neural Networks and Applications
Functional Brain Connectivity Studies
Advanced Text Analysis Techniques
Data Management and Algorithms
Information Retrieval and Search Behavior

Zhejiang Chinese Medical University
2023-2025

Jacobs Institute
2020-2024

Amazon (United States)
2022-2023

China Astronaut Research and Training Center
2023

University of Illinois Urbana-Champaign
2023

Seattle University
2022

Harbin Medical University
2022

Fourth Affiliated Hospital of Harbin Medical University
2022

Amazon (Germany)
2019-2021

Tianjin University
2020

Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering

OPENALEX - Publications

Zhiguo Wang Patrick Ng Xiaofei Ma Ramesh Nallapati Bing Xiang

Zhiguo Wang, Patrick Ng, Xiaofei Ma, Ramesh Nallapati, Bing Xiang. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.

10.18653/v1/d19-1599 article EN cc-by 2019-01-01

Domain Adaptation with BERT-based Domain Classification and Data Selection

OPENALEX - Publications

Xiaofei Ma Peng Xu Zhiguo Wang Ramesh Nallapati Bing Xiang

The performance of deep neural models can deteriorate substantially when there is a domain shift between training and test data. For example, the pre-trained BERT model be easily fine-tuned with just one additional output layer to create state-of-the-art for wide range tasks. However, suffers considerably at zero-shot applied different domain. In this paper, we present novel two-step adaptation framework based on curriculum learning domain-discriminative data selection. conducted in mostly...

10.18653/v1/d19-6109 article EN cc-by 2019-01-01

Universal Text Representation from BERT: An Empirical Study

OPENALEX - Publications

Xiaofei Ma Zhiguo Wang Patrick Ng Ramesh Nallapati Bing Xiang

We present a systematic investigation of layer-wise BERT activations for general-purpose text representations to understand what linguistic information they capture and how transferable are across different tasks. Sentence-level embeddings evaluated against two state-of-the-art models on downstream probing tasks from SentEval, while passage-level four question-answering (QA) datasets under learning-to-rank problem setting. Embeddings the pre-trained model perform poorly in semantic...

10.48550/arxiv.1910.07973 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Code-Aware Prompting: A Study of Coverage-Guided Test Generation in Regression Setting using LLM

OPENALEX - Publications

Gabriel Ryan Siddhartha Jain Mingyue Shang Shiqi Wang Xiaofei Ma and 2 more

Testing plays a pivotal role in ensuring software quality, yet conventional Search Based Software (SBST) methods often struggle with complex units, achieving suboptimal test coverage. Recent work using large language models (LLMs) for generation have focused on improving quality through optimizing the context and correcting errors model outputs, but use fixed prompting strategies that prompt to generate tests without additional guidance. As result LLM-generated testsuites still suffer from...

10.1145/3643769 article EN Proceedings of the ACM on software engineering. 2024-07-12

DeepGlioSeg: advanced glioma MRI data segmentation with integrated local-global representation architecture

OPENALEX - Publications

Ruipeng Li Yuehui Liao Yueqi Huang Xiaofei Ma Guohua Zhao and 2 more

Introduction Glioma segmentation is vital for diagnostic decision-making, monitoring disease progression, and surgical planning. However, this task hindered by substantial heterogeneity within gliomas imbalanced region distributions, posing challenges to existing methods. Methods To address these challenges, we propose the DeepGlioSeg network, a U-shaped architecture with skip connections continuous contextual feature integration. The model includes two primary components. First, CTPC...

10.3389/fonc.2025.1449911 article EN cc-by Frontiers in Oncology 2025-02-04

Dual examiner consistency learning with dynamic receptive fields and class-balance refinement for Barely-supervised brain tumor segmentation

OPENALEX - Publications

Xiaofei Ma Man-Man Tian Jianming Ye Yuehui Liao Yu Chen and 6 more

10.1016/j.displa.2025.103054 article EN Displays 2025-04-01

Beyond [CLS] through Ranking by Generation

OPENALEX - Publications

Cícero Nogueira dos Santos Xiaofei Ma Ramesh Nallapati Zhiheng Huang Bing Xiang

Generative models for Information Retrieval, where ranking of documents is viewed as the task generating a query from document's language model, were very successful in various IR tasks past. However, with advent modern deep neural networks, attention has shifted to discriminative functions that model semantic similarity and queries instead. Recently, generative such GPT2 BART have been shown be excellent text generators, but their effectiveness rankers not demonstrated yet. In this work, we...

10.18653/v1/2020.emnlp-main.134 article EN cc-by 2020-01-01

Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner

OPENALEX - Publications

Danilo Neves Ribeiro Shen Wang Xiaofei Ma Rui Dong Xiaokai Wei and 6 more

Danilo Neves Ribeiro, Shen Wang, Xiaofei Ma, Rui Dong, Xiaokai Wei, Henghui Zhu, Xinchi Chen, Peng Xu, Zhiheng Huang, Andrew Arnold, Dan Roth. Findings of the Association for Computational Linguistics: NAACL 2022.

10.18653/v1/2022.findings-naacl.35 article EN cc-by Findings of the Association for Computational Linguistics: NAACL 2022 2022-01-01

Non‐invasive prediction of overall survival time for glioblastoma multiforme patients based on multimodal MRI radiomics

OPENALEX - Publications

Jingyu Zhu Jianming Ye Leshui Dong Xiaofei Ma Na Tang and 5 more

Abstract Glioblastoma multiforme (GBM) is the most common and deadly primary malignant brain tumor. As GBM tumor aggressive shows high biological heterogeneity, overall survival (OS) time extremely low even with treatment. If OS can be predicted before surgery, developing personalized treatment plans for patients will beneficial. Magnetic resonance imaging (MRI) a commonly used diagnostic tool tumors high‐resolution sound effects. However, in clinical practice, doctors mainly rely on...

10.1002/ima.22869 article EN cc-by International Journal of Imaging Systems and Technology 2023-03-10

Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM

OPENALEX - Publications

Gabriel Ryan Siddhartha Jain Mingyue Shang Shiqi Wang Xiaofei Ma and 2 more

10.48550/arxiv.2402.00097 preprint EN arXiv (Cornell University) 2024-01-31

Virtual Augmentation Supported Contrastive Learning of Sentence Representations

OPENALEX - Publications

Dejiao Zhang Xiao Wei Henghui Zhu Xiaofei Ma Andrew Arnold

Despite profound successes, contrastive representation learning relies on carefully designed data augmentations using domain-specific knowledge. This challenge is magnified in natural language processing, where no general rules exist for augmentation due to the discrete nature of language. We tackle this by presenting a Virtual Supported Contrastive Learning sentence representations (VaSCL). Originating from interpretation that essentially constructs neighborhoods each training instance, we,...

10.18653/v1/2022.findings-acl.70 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2022-01-01

Specific emitter identification for satellite communication using probabilistic neural networks

OPENALEX - Publications

Xiaopo Wu Yangming Shi Weibo Meng Xiaofei Ma Nian Fang

Summary Electromagnetic signal emitted by satellite communication (satcom) transmitters are used to identify specific individual uplink satcom terminals sharing the common transponder in real environment, which is known as emitter identification (SEI) that allows for early indications and warning (I&W) of targets carrying furnishment furthermore time electromagnetic situation awareness military operations. In this paper, authors first propose using probabilistic neural networks (PNN)...

10.1002/sat.1286 article EN International Journal of Satellite Communications and Networking 2018-11-22

Approximate entropy analysis of short-term HFECG based on wave mode

OPENALEX - Publications

Xinbao Ning Yinlin Xu Jun Wang Xiaofei Ma

10.1016/j.physa.2004.07.040 article EN Physica A Statistical Mechanics and its Applications 2004-08-25

Passage Ranking with Weak Supervision

OPENALEX - Publications

Peng Xu Xiaofei Ma Ramesh Nallapati Bing Xiang

In this paper, we propose a \textit{weak supervision} framework for neural ranking tasks based on the data programming paradigm \citep{Ratner2016}, which enables us to leverage multiple weak supervision signals from different sources. Empirically, consider two sources of signals, unsupervised functions and semantic feature similarities. We train BERT-based passage-ranking model (which achieves new state-of-the-art performances benchmark datasets with full supervision) in our framework....

10.48550/arxiv.1905.05910 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering

OPENALEX - Publications

Zhiguo Wang Patrick Ng Xiaofei Ma Ramesh Nallapati Bing Xiang

BERT model has been successfully applied to open-domain QA tasks. However, previous work trains by viewing passages corresponding the same question as independent training instances, which may cause incomparable scores for answers from different passages. To tackle this issue, we propose a multi-passage globally normalize answer across all of question, and change enables our find better utilizing more In addition, that splitting articles into with length 100 words sliding window improves...

10.48550/arxiv.1908.08167 preprint EN cc-by-sa arXiv (Cornell University) 2019-01-01

Contrastive Fine-tuning Improves Robustness for Neural Rankers

OPENALEX - Publications

Xiaofei Ma Cícero Nogueira dos Santos Andrew O. Arnold

The performance of state-of-the-art neural rankers can deteriorate substantially when exposed to noisy inputs or applied a new domain.In this paper, we present novel method for fine-tuning that significantly improve their robustness out-of-domain data and query perturbations.Specifically, contrastive loss compares points in the representation space is combined with standard ranking during fine-tuning.We use relevance labels denote similar/dissimilar pairs, which allows model learn underlying...

10.18653/v1/2021.findings-acl.51 article EN cc-by 2021-01-01

Learning Dialogue Representations from Consecutive Utterances

OPENALEX - Publications

Zhihan Zhou Dejiao Zhang Wei Xiao Nicholas Dingwall Xiaofei Ma and 2 more

Zhihan Zhou, Dejiao Zhang, Wei Xiao, Nicholas Dingwall, Xiaofei Ma, Andrew Arnold, Bing Xiang. Proceedings of the 2022 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2022.

10.18653/v1/2022.naacl-main.55 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2022-01-01

STREET: A Multi-Task Structured Reasoning and Explanation Benchmark

OPENALEX - Publications

José A. Ribeiro Shen Wang Xiaofei Ma Henry Zhu Rui Dong and 8 more

We introduce STREET, a unified multi-task and multi-domain natural language reasoning explanation benchmark. Unlike most existing question-answering (QA) datasets, we expect models to not only answer questions, but also produce step-by-step structured explanations describing how premises in the question are used intermediate conclusions that can prove correctness of certain answer. perform extensive evaluation with popular such as few-shot prompting GPT-3 fine-tuned T5. find these still lag...

10.48550/arxiv.2302.06729 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Exploring Continual Learning for Code Generation Models

OPENALEX - Publications

Prateek Yadav Qing Sun Hantian Ding Xiaopeng Li Dejiao Zhang and 7 more

Prateek Yadav, Qing Sun, Hantian Ding, Xiaopeng Li, Dejiao Zhang, Ming Tan, Parminder Bhatia, Xiaofei Ma, Ramesh Nallapati, Murali Krishna Ramanathan, Mohit Bansal, Bing Xiang. Proceedings of the 61st Annual Meeting Association for Computational Linguistics (Volume 2: Short Papers). 2023.

10.18653/v1/2023.acl-short.68 article EN cc-by 2023-01-01

SWING: Balancing Coverage and Faithfulness for Dialogue Summarization

OPENALEX - Publications

Kung-Hsiang Huang Siffi Singh Xiaofei Ma Wei Xiao Nan Feng and 3 more

Missing information is a common issue of dialogue summarization where some in the reference summaries not covered generated summaries. To address this issue, we propose to utilize natural language inference (NLI) models improve coverage while avoiding introducing factual inconsistencies. Specifically, use NLI compute fine-grained training signals encourage model generate content that have been covered, as well distinguish between factually consistent and inconsistent sentences. Experiments...

10.18653/v1/2023.findings-eacl.37 article EN cc-by 2023-01-01

EEG based topography analysis in string recognition task

OPENALEX - Publications

Xiaofei Ma Xiaolin Huang Yuxiaotong Shen Zike Qin Yun Ge and 2 more

10.1016/j.physa.2016.11.105 article EN Physica A Statistical Mechanics and its Applications 2016-11-23

Brain Connectivity Variation Topography Associated with Working Memory

OPENALEX - Publications

Xiaofei Ma Xiaolin Huang Yun Ge Yueming Hu Wei Chen and 5 more

Brain connectivity analysis plays an essential role in the research of working memory that involves complex coordination various brain regions. In this research, we present a comprehensive view trans-states variation based on continuous scalp EEG, extending beyond traditional stimuli-lock averaging or restriction to short time scales hundreds milliseconds after stimulus onset. The EEG was collected under three conditions: quiet, memory, and control. only difference between control conditions...

10.1371/journal.pone.0165168 article EN cc-by PLoS ONE 2016-12-08

Detecting dynamical complexity changes in time series using the base-scale entropy

OPENALEX - Publications

Jin Li Xinbao Ning Wei Wu Xiaofei Ma

Timely detection of dynamical complexity changes in natural and man-made systems has deep scientific practical meanings. We introduce a measure for time series: the base-scale entropy. The definition directly applies to arbitrary real-word data. illustrate our method on speech signal theoretical chaotic system. results show that simple easily calculated entropy can be effectively used detect qualitative quantitative changes.

10.1088/1009-1963/14/12/010 article EN Chinese Physics 2005-11-30

Contrastive Document Representation Learning with Graph Attention Networks

OPENALEX - Publications

Peng Xu Xinchi Chen Xiaofei Ma Zhiheng Huang Bing Xiang

Recent progress in pretrained Transformer-based language models has shown great success learning contextual representation of text. However, due to the quadratic self-attention complexity, most Transformers can only handle relatively short It is still a challenge when it comes modeling very long documents. In this work, we propose use graph attention network on top available model learn document embeddings. This allows us leverage high-level semantic structure document. addition, based our...

10.18653/v1/2021.findings-emnlp.327 preprint EN cc-by 2021-01-01

Coming Soon ...