- Natural Language Processing Techniques
- Topic Modeling
- Multimodal Machine Learning Applications
- Advanced Text Analysis Techniques
- Advanced Wireless Network Optimization
- Advanced MIMO Systems Optimization
- Wireless Networks and Protocols
- Complex Network Analysis Techniques
- Text and Document Classification Technologies
- Sensor Technology and Measurement Systems
- Advanced Neural Network Applications
- Advanced Electrical Measurement Techniques
- Domain Adaptation and Few-Shot Learning
- Neural Networks and Applications
- Ferroelectric and Negative Capacitance Devices
- Video Analysis and Summarization
- Seismic Waves and Analysis
- Advanced Graph Neural Networks
- E-commerce and Technology Innovations
- Higher Education and Teaching Methods
- Geophysical Methods and Applications
- Traffic Prediction and Management Techniques
- Web Data Mining and Analysis
- Caching and Content Delivery
- Time Series Analysis and Forecasting
Tianjin University
2020-2025
William & Mary
2025
Microsoft Research Asia (China)
2024
Peking University
2022
Microsoft Research (India)
2022
University of Edinburgh
2016-2019
Peng Cheng Laboratory
2019
Beijing Information Science & Technology University
2010-2013
Dali University
2012
China Electronic Product Reliability and Environmental Test Institute
2010
Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what say order. In this work, we present a architecture incorporates content selection planning sacrificing end-to-end training. We decompose task into two stages. Given corpus data records (paired with descriptive documents), first generate plan highlighting information should be mentioned order then document while taking...
In this paper we address the question of how to render sequence-level networks better at handling structured input. We propose a machine reading simulator which processes text incrementally from left right and performs shallow reasoning with memory attention. The reader extends Long Short-Term Memory architecture network in place single cell. This enables adaptive usage during recurrence neural attention, offering way weakly induce relations among tokens. system is initially designed process...
Performing cellular long term evolution (LTE) communications in unlicensed spectrum using licensed assisted access LTE (LTE-LAA) is a promising approach to overcome wireless scarcity. However, reap the benefits of LTE-LAA, fair coexistence mechanism with other incumbent WiFi deployments required. In this paper, novel deep learning proposed for modeling resource allocation problem LTE-LAA small base stations (SBSs). The enables multiple SBSs proactively perform dynamic channel selection,...
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence attention. Then retention mechanism sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, chunkwise recurrent. Specifically, parallel representation allows parallelism. The recurrent enables $O(1)$...
In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers. Specifically, introduce new normalization function ( <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">DeepNorm</small> ) modify the residual connection in Transformer, accompanying with theoretically derived initialization. In-depth theoretical analysis shows that model updates can be bounded stable way. The proposed combines best of two worlds, i.e.,...
Semantic parsing aims at mapping natural language to machine interpretable meaning representations. Traditional approaches rely on high-quality lexicons, manually-built templates, and linguistic features which are either domain- or representation-specific. In this paper we present a general method based an attention-enhanced encoder-decoder model. We encode input utterances into vector representations, generate their logical forms by conditioning the output sequences trees encoding vectors....
Machine reading comprehension with unanswerable questions is a challenging task. In this work, we propose data augmentation technique by automatically generating relevant according to an answerable question paired its corresponding paragraph that contains the answer. We introduce pair-to-sequence model for generation, which effectively captures interactions between and paragraph. also present way construct training our generation models leveraging existing dataset. Experimental results show...
Semantic parsing aims at mapping natural language utterances into structured meaning representations. In this work, we propose a structure-aware neural architecture which decomposes the semantic process two stages. Given an input utterance, first generate rough sketch of its meaning, where low-level information (such as variable names and arguments) is glossed over. Then, fill in missing details by taking account itself. Experimental results on four datasets characteristic different domains...
The Mixture-of-Experts (MoE) technique can scale up the model size of Transformers with an affordable computational overhead. We point out that existing learning-to-route MoE methods suffer from routing fluctuation issue, i.e., target expert same input may change along training, but only one will be activated for during inference. tends to harm sample efficiency because updates different experts is finally used. In this paper, we propose StableMoE two training stages address problem. first...
Abstract Diversion tunnels play a critical role in water conservancy and hydropower projects. However, due to complex geological conditions, especially the influence of buried fault structures that are difficult observe below surface directly, construction processes often face significant challenges such as rock mass instability, seepage, abrupt changes. Audio magnetotelluric (AMT) technology, high-resolution electromagnetic exploration method, demonstrates remarkable advantages detecting...
Question answering (QA) systems are sensitive to the many different ways natural language expresses same information need. In this paper we turn paraphrases as a means of capturing knowledge and present general framework which learns felicitous for various QA tasks. Our method is trained end-to-end using question-answer pairs supervision signal. A question its serve input neural scoring model assigns higher weights linguistic expressions most likely yield correct answers. We evaluate our...
Language model pre-training, such as BERT, has achieved remarkable results in many NLP tasks. However, it is unclear why the pre-training-then-fine-tuning paradigm can improve performance and generalization capability across different In this paper, we propose to visualize loss landscapes optimization trajectories of fine-tuning BERT on specific datasets. First, find that pre-training reaches a good initial point downstream tasks, which leads wider optima easier compared with training from...
Chinese character riddle is a game in which the solution single character.It closely connected with shape, pronunciation or meaning of characters.The description (sentence) usually composed phrases rich linguistic phenomena (such as pun, simile, and metaphor), are associated to different parts (namely radicals) character.In this paper, we propose statistical framework solve generate riddles.Specifically, learn alignments rules identify metaphors between riddles radicals characters.Then,...
Service descriptions via Web Services Description Language (WSDL) are necessary but not sufficient to service selection based on trust. we need a means collect nonfunctional information about services and use that assign dynamic trust levels the providers implementations. In this paper, briefly discuss problem of selection, given some correlation definitions, proposed evaluation model for resolution above question.
To transfer the representation capacity of large pre-trained models to lightweight models, knowledge distillation has been widely explored. However, conventional single-stage methods are prone getting stuck in task-specific knowledge, making it difficult retain task-agnostic which is crucial for model generalization. In this study, we propose generic-to-specific (G2SD), boost under assistance by masked image modeling. generic distillation, decoder a small encouraged align feature predictions...