Xiaoxi Li

ORCID: 0009-0003-0708-418X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Speech Recognition and Synthesis
  • Higher Education and Teaching Methods
  • Information and Cyber Security
  • Numerical methods for differential equations
  • Differential Equations and Numerical Methods
  • Web and Library Services
  • Data Mining Algorithms and Applications
  • Multimodal Machine Learning Applications
  • Vibration and Dynamic Analysis
  • Bayesian Methods and Mixture Models
  • Ultrasonics and Acoustic Wave Propagation
  • Statistical Methods and Bayesian Inference
  • Speech and dialogue systems
  • Library Science and Information Literacy
  • Thermography and Photoacoustic Techniques
  • Advanced Manufacturing and Logistics Optimization
  • Robotics and Sensor-Based Localization
  • AI and Big Data Applications
  • Information Retrieval and Search Behavior
  • Zebrafish Biomedical Research Applications
  • Evaluation and Optimization Models
  • Globalization, Economics, and Policies
  • Human Pose and Action Recognition

Renmin University of China
2024-2025

Harbin Engineering University
2024

Huazhong University of Science and Technology
2020-2023

Harbin Institute of Technology
2022

Nankai University
2022

China Academy of Space Technology
2022

Shenzhen Institutes of Advanced Technology
2021

Chinese Academy of Sciences
2021

Beijing Normal University
2008-2020

University of Electronic Science and Technology of China
2016-2020

Information Retrieval (IR) systems are crucial tools for users to access information, which have long been dominated by traditional methods relying on similarity matching. With the advancement of pre-trained language models, generative information retrieval (GenIR) emerges as a novel paradigm, attracting increasing attention. Based form provided users, current research in GenIR can be categorized into two aspects: (1) Generative Document (GR) leverages model’s parameters memorizing...

10.1145/3722552 article EN ACM transactions on office information systems 2025-03-11

Large reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise capabilities through large-scale reinforcement learning. However, their extended processes often suffer from knowledge insufficiency, leading to frequent uncertainties and potential errors. To address this limitation, we introduce \textbf{Search-o1}, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism Reason-in-Documents module for refining retrieved documents....

10.48550/arxiv.2501.05366 preprint EN arXiv (Cornell University) 2025-01-09

Generative information retrieval, encompassing two major tasks of Document Retrieval (GDR) and Grounded Answer Generation (GAR), has gained significant attention in natural language processing. Existing methods for GDR GAR rely on separate retrieval reader modules, which hinder simultaneous optimization. To overcome this, we present UniGen, a Unified framework question answering that integrates both into single generative model leveraging the capabilities large models. UniGen employs shared...

10.1609/aaai.v38i8.28714 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

This paper proposes a novel architecture, Cross Attention Augmented Transducer (CAAT), for simultaneous translation. The framework aims to jointly optimize the policy and translation models. To effectively consider all possible READ-WRITE action paths, we adapt online automatic speech recognition (ASR) model, RNN-T, but remove strong monotonic constraint, which is critical task reordering. make CAAT work, introduce latency loss whose expectation can be optimized by forward-backward...

10.18653/v1/2021.emnlp-main.4 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

10.1109/icassp49660.2025.10888294 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

10.1109/icassp49660.2025.10889325 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Generative document retrieval is a novel framework, which represents documents as identifiers (DocID) and retrieves by generating DocIDs. It has the advantage of end-to-end optimization over traditional methods attracted much research interest. Nonetheless, development efficient precise DocIDs for representation remains pertinent issue within field. Existing designing tend to consider only relevance corresponding documents, while neglecting ability distinguish from similar ones, crucial...

10.1609/aaai.v39i11.33253 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Weitai Zhang, Zhongyi Ye, Haitao Tang, Xiaoxi Li, Xinyuan Zhou, Jing Yang, Jianwei Cui, Pan Deng, Mohan Shi, Yifan Song, Dan Liu, Junhua Lirong Dai. Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022). 2022.

10.18653/v1/2022.iwslt-1.15 article EN cc-by 2022-01-01

10.1145/3626772.3657778 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2024-07-10

Abstract. Systematic sampling is frequently used in surveys, because of its ease implementation and design efficiency. An important drawback systematic sampling, however, that no direct estimator the variance available. We describe a new model‐based expectation variance, under non‐parametric model for population. The sufficiently flexible it can be expected to hold at least approximately many situations with continuous auxiliary variables observed population level. prove consistency both...

10.1111/j.1467-9469.2011.00773.x article EN Scandinavian Journal of Statistics 2012-03-23

This paper describes USTC-NELSLIP’s submissions to the IWSLT2021 Simultaneous Speech Translation task. We proposed a novel simultaneous translation model, Cross-Attention Augmented Transducer (CAAT), which extends conventional RNN-T sequence-to-sequence tasks without monotonic constraints, e.g., translation. Experiments on speech-to-text (S2T) and text-to-text (T2T) shows CAAT achieves better quality-latency trade-offs compared wait-k, one of previous state-of-the-art approaches. Based...

10.18653/v1/2021.iwslt-1.2 article EN cc-by 2021-01-01

Multilayer materials with metal-metal bonded structure have been widely applied in aviation, aerospace, and nuclear industry. Disbond is prone to exist lead-steel structure, which degrades the load capacity mechanical behaviors. Thermography nondestructive testing a potential candidate for sub-layer defect detection. However, unbearable when undertaken over-heating of instantaneous temperature, will lead subsequent damage or generation more unpredictable disbond. In addition, detection...

10.1109/jsen.2018.2822290 article EN IEEE Sensors Journal 2018-04-02

Information Retrieval (IR) systems are crucial tools for users to access information, widely applied in scenarios like search engines, question answering, and recommendation systems. Traditional IR methods, based on similarity matching return ranked lists of documents, have been reliable means information acquisition, dominating the field years. With advancement pre-trained language models, generative retrieval (GenIR) has emerged as a novel paradigm, gaining increasing attention recent...

10.48550/arxiv.2404.14851 preprint EN arXiv (Cornell University) 2024-04-23

Large language models (LLMs) exhibit remarkable generative capabilities but often suffer from hallucinations. Retrieval-augmented generation (RAG) offers an effective solution by incorporating external knowledge, existing methods still face several limitations: additional deployment costs of separate retrievers, redundant input tokens retrieved text chunks, and the lack joint optimization retrieval generation. To address these issues, we propose \textbf{RetroLLM}, a unified framework that...

10.48550/arxiv.2412.11919 preprint EN arXiv (Cornell University) 2024-12-16

Developing a BIM-Based Integrated Model for CAD to CAM Production Automation Xiaoxi Li, Ahmed Qureshi and Mohamed Al-Hussein Pages 51-58 (2017 Proceedings of the 34rd ISARC, Taipei, Taiwan, ISBN 978-80-263-1371-7, ISSN 2413-5844) Abstract: Modular construction has gained momentum in North America as an emerging paradigm recent years. buildings are assembled from components that prefabricated manufacturing plants transported site assembly. The current manual-based approach modular...

10.22260/isarc2017/0007 article EN Proceedings of the ... ISARC 2017-07-01

Multi-layer metal-metal bonding structure is widely applied in aviation, aerospace, and nuclear industrial fields. Debonding defects retains high attention Nondestructive testing evaluation society. This paper proposes the feasibility study for inner debonding defect detection of lead-steel by using eddy current pulsed thermography. Numerical has been conducted validation studies on detectability sensitivity curve versus effect excitation heating time are reported. According to numerical...

10.1109/fendt.2016.7992027 article EN 2016-06-01

The advent of large language models (LLMs) has showcased their efficacy across various domains, yet they often hallucinate, especially in knowledge-intensive tasks that require external knowledge sources. To improve factual accuracy models, retrieval-augmented generation (RAG) emerged as a popular solution. However, traditional retrieval modules rely on large-scale document indexes, which can be disconnected from generative tasks. Through (GR) approach, achieve superior performance by...

10.48550/arxiv.2402.01176 preprint EN arXiv (Cornell University) 2024-02-02

Abstract Liquid metal reactors (LMR), with the significant advantages in high safety and good economic benefits, has obvious broad application prospects fourth generation nuclear power systems. Spiral tube steam generator is one of most important equipment LMR, which are composed spiral heat transfer tubes, inner outer cylinders, feedwater headers, other structures. Due to their compact structure efficiency, they conducive miniaturization have been widely used different various countries....

10.1115/icone31-134953 article EN Volume 15: Student Paper Competition 2024-08-04

In obtaining Digital Elevation Model (DEM), most methods of acquiring the tie points are generated automatically by software and then manually screened, which is time-consuming labor-intensive, accuracy cannot be guaranteed. Therefore, this paper proposes an automatic stereo matching method combining Speeded Up Robust Features (SURF) Rational Function (RFM) to reconstruct 3D model remote sensing generalized image pairs. There two main tasks: first, apply SURF algorithm images, screen at same...

10.1109/igarss39084.2020.9324216 article EN IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium 2020-09-26

In order to meet the requirements of data link cooperative operation, a terminal based on Tianlian relay satellite and Beidou short message dual-modes communications is proposed. mode, methods random insertion empty frames fast antenna switching are adopted ensure integrity image data, antennal time no more than 10us at high speed rate. adds function broadcast transmission basis traditional point-to-point communication, can automatically identify type user machine, which not only compatible...

10.1109/isncc55209.2022.9851765 article EN 2022 International Symposium on Networks, Computers and Communications (ISNCC) 2022-07-19
Coming Soon ...