Li Dong

ORCID: 0000-0003-3083-7170
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Multimodal Machine Learning Applications
  • Advanced Text Analysis Techniques
  • Advanced Wireless Network Optimization
  • Advanced MIMO Systems Optimization
  • Wireless Networks and Protocols
  • Complex Network Analysis Techniques
  • Text and Document Classification Technologies
  • Sensor Technology and Measurement Systems
  • Advanced Neural Network Applications
  • Advanced Electrical Measurement Techniques
  • Domain Adaptation and Few-Shot Learning
  • Neural Networks and Applications
  • Ferroelectric and Negative Capacitance Devices
  • Video Analysis and Summarization
  • Seismic Waves and Analysis
  • Advanced Graph Neural Networks
  • E-commerce and Technology Innovations
  • Higher Education and Teaching Methods
  • Geophysical Methods and Applications
  • Traffic Prediction and Management Techniques
  • Web Data Mining and Analysis
  • Caching and Content Delivery
  • Time Series Analysis and Forecasting

Tianjin University
2020-2025

William & Mary
2025

Microsoft Research Asia (China)
2024

Peking University
2022

Microsoft Research (India)
2022

University of Edinburgh
2016-2019

Peng Cheng Laboratory
2019

Beijing Information Science & Technology University
2010-2013

Dali University
2012

China Electronic Product Reliability and Environmental Test Institute
2010

Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what say order. In this work, we present a architecture incorporates content selection planning sacrificing end-to-end training. We decompose task into two stages. Given corpus data records (paired with descriptive documents), first generate plan highlighting information should be mentioned order then document while taking...

10.1609/aaai.v33i01.33016908 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

In this paper we address the question of how to render sequence-level networks better at handling structured input. We propose a machine reading simulator which processes text incrementally from left right and performs shallow reasoning with memory attention. The reader extends Long Short-Term Memory architecture network in place single cell. This enables adaptive usage during recurrence neural attention, offering way weakly induce relations among tokens. system is initially designed process...

10.48550/arxiv.1601.06733 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Performing cellular long term evolution (LTE) communications in unlicensed spectrum using licensed assisted access LTE (LTE-LAA) is a promising approach to overcome wireless scarcity. However, reap the benefits of LTE-LAA, fair coexistence mechanism with other incumbent WiFi deployments required. In this paper, novel deep learning proposed for modeling resource allocation problem LTE-LAA small base stations (SBSs). The enables multiple SBSs proactively perform dynamic channel selection,...

10.1109/twc.2018.2829773 article EN publisher-specific-oa IEEE Transactions on Wireless Communications 2018-05-15

In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence attention. Then retention mechanism sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, chunkwise recurrent. Specifically, parallel representation allows parallelism. The recurrent enables $O(1)$...

10.48550/arxiv.2307.08621 preprint EN other-oa arXiv (Cornell University) 2023-01-01

In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers. Specifically, introduce new normalization function ( <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">DeepNorm</small> ) modify the residual connection in Transformer, accompanying with theoretically derived initialization. In-depth theoretical analysis shows that model updates can be bounded stable way. The proposed combines best of two worlds, i.e.,...

10.1109/tpami.2024.3386927 article EN cc-by IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-04-10

Semantic parsing aims at mapping natural language to machine interpretable meaning representations. Traditional approaches rely on high-quality lexicons, manually-built templates, and linguistic features which are either domain- or representation-specific. In this paper we present a general method based an attention-enhanced encoder-decoder model. We encode input utterances into vector representations, generate their logical forms by conditioning the output sequences trees encoding vectors....

10.48550/arxiv.1601.01280 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Machine reading comprehension with unanswerable questions is a challenging task. In this work, we propose data augmentation technique by automatically generating relevant according to an answerable question paired its corresponding paragraph that contains the answer. We introduce pair-to-sequence model for generation, which effectively captures interactions between and paragraph. also present way construct training our generation models leveraging existing dataset. Experimental results show...

10.18653/v1/p19-1415 preprint EN cc-by 2019-01-01

Semantic parsing aims at mapping natural language utterances into structured meaning representations. In this work, we propose a structure-aware neural architecture which decomposes the semantic process two stages. Given an input utterance, first generate rough sketch of its meaning, where low-level information (such as variable names and arguments) is glossed over. Then, fill in missing details by taking account itself. Experimental results on four datasets characteristic different domains...

10.48550/arxiv.1805.04793 preprint EN other-oa arXiv (Cornell University) 2018-01-01

The Mixture-of-Experts (MoE) technique can scale up the model size of Transformers with an affordable computational overhead. We point out that existing learning-to-route MoE methods suffer from routing fluctuation issue, i.e., target expert same input may change along training, but only one will be activated for during inference. tends to harm sample efficiency because updates different experts is finally used. In this paper, we propose StableMoE two training stages address problem. first...

10.18653/v1/2022.acl-long.489 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

10.1109/icassp49660.2025.10890879 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Abstract Diversion tunnels play a critical role in water conservancy and hydropower projects. However, due to complex geological conditions, especially the influence of buried fault structures that are difficult observe below surface directly, construction processes often face significant challenges such as rock mass instability, seepage, abrupt changes. Audio magnetotelluric (AMT) technology, high-resolution electromagnetic exploration method, demonstrates remarkable advantages detecting...

10.1088/1742-6596/2990/1/012003 article EN Journal of Physics Conference Series 2025-04-01

Question answering (QA) systems are sensitive to the many different ways natural language expresses same information need. In this paper we turn paraphrases as a means of capturing knowledge and present general framework which learns felicitous for various QA tasks. Our method is trained end-to-end using question-answer pairs supervision signal. A question its serve input neural scoring model assigns higher weights linguistic expressions most likely yield correct answers. We evaluate our...

10.48550/arxiv.1708.06022 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Language model pre-training, such as BERT, has achieved remarkable results in many NLP tasks. However, it is unclear why the pre-training-then-fine-tuning paradigm can improve performance and generalization capability across different In this paper, we propose to visualize loss landscapes optimization trajectories of fine-tuning BERT on specific datasets. First, find that pre-training reaches a good initial point downstream tasks, which leads wider optima easier compared with training from...

10.48550/arxiv.1908.05620 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Chinese character riddle is a game in which the solution single character.It closely connected with shape, pronunciation or meaning of characters.The description (sentence) usually composed phrases rich linguistic phenomena (such as pun, simile, and metaphor), are associated to different parts (namely radicals) character.In this paper, we propose statistical framework solve generate riddles.Specifically, learn alignments rules identify metaphors between riddles radicals characters.Then,...

10.18653/v1/d16-1081 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2016-01-01

Service descriptions via Web Services Description Language (WSDL) are necessary but not sufficient to service selection based on trust. we need a means collect nonfunctional information about services and use that assign dynamic trust levels the providers implementations. In this paper, briefly discuss problem of selection, given some correlation definitions, proposed evaluation model for resolution above question.

10.1109/iitsi.2010.175 article EN 2010-04-01

To transfer the representation capacity of large pre-trained models to lightweight models, knowledge distillation has been widely explored. However, conventional single-stage methods are prone getting stuck in task-specific knowledge, making it difficult retain task-agnostic which is crucial for model generalization. In this study, we propose generic-to-specific (G2SD), boost under assistance by masked image modeling. generic distillation, decoder a small encouraged align feature predictions...

10.1109/tcsvt.2024.3393474 article EN IEEE Transactions on Circuits and Systems for Video Technology 2024-04-25
Coming Soon ...