NFDI4DS | UHH-SEMS - Publication Details

HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

OPENALEX - Publications

Xingxing Zhang Furu Wei Ming Zhou

Neural extractive summarization models usually employ a hierarchical encoder for document encoding and they are trained using sentence-level labels, which created heuristically rule-based methods. Training the with these inaccurate labels is challenging. Inspired by recent work on pre-training transformer sentence encoders (Devlin et al., 2018), we propose Hibert (as shorthand HIerachical Bidirectional Encoder Representations from Transformers) method to pre-train it unlabeled data. We apply...

10.18653/v1/p19-1499 preprint EN cc-by 2019-01-01

Chinese Poetry Generation with Recurrent Neural Networks

OPENALEX - Publications

Xingxing Zhang Mirella Lapata

We propose a model for Chinese poem generation based on recurrent neural networks which we argue is ideally suited to capturing poetic content and form.Our generator jointly performs selection ("what say") surface realization ("how by learning representations of individual characters, their combinations into one or more lines as well how these mutually reinforce constrain each other.Poem are generated incrementally taking account the entire history what has been so far rather than limited...

10.3115/v1/d14-1074 article EN cc-by 2014-01-01

Sentence Simplification with Deep Reinforcement Learning

OPENALEX - Publications

Xingxing Zhang Mirella Lapata

Sentence simplification aims to make sentences easier read and understand. Most recent approaches draw on insights from machine translation learn rewrites monolingual corpora of complex simple sentences. We address the problem with an encoder-decoder model coupled a deep reinforcement learning framework. Our model, which we call DRESS (as shorthand for Deep REinforcement Simplification), explores space possible simplifications while optimize reward function that encourages outputs are...

10.18653/v1/d17-1062 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2017-01-01

A Comprehensive Survey of Continual Learning: Theory, Method and Application

OPENALEX - Publications

Liyuan Wang Xingxing Zhang Hang Su Jun Zhu

To cope with real-world dynamics, an intelligent system needs to incrementally acquire, update, accumulate, and exploit knowledge throughout its lifetime. This ability, known as continual learning, provides a foundation for AI systems develop themselves adaptively. In general sense, learning is explicitly limited by catastrophic forgetting, where new task usually results in dramatic performance drop of the old tasks. Beyond this, increasingly numerous advances have emerged recent years that...

10.1109/tpami.2024.3367329 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-02-26

On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition

OPENALEX - Publications

Liang Lu Xingxing Zhang Steve Renais

Recently, there has been an increasing interest in end-to-end speech recognition using neural networks, with no reliance on hidden Markov models (HMMs) for sequence modelling as the standard hybrid framework. The recurrent network (RNN) encoderdecoder is such a model, performing to mapping without any predefined alignment. This model first transforms input into fixed length vector representation, from which decoder recovers output sequence. In this paper, we extend our previous work large...

10.1109/icassp.2016.7472641 article EN 2016-03-01

Dependency Parsing as Head Selection

OPENALEX - Publications

Xingxing Zhang Jianpeng Cheng Mirella Lapata

Conventional graph-based dependency parsers guarantee a tree structure both during training and inference. Instead, we formalize parsing as the problem of independently selecting head each word in sentence. Our model which call DENSE (as shorthand for Dependency Neural Selection) produces distribution over possible heads using features obtained from bidirectional recurrent neural network. Without enforcing structural constraints training, DeNSe generates (at inference time) trees...

10.18653/v1/e17-1063 article EN cc-by 2017-01-01

A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition

OPENALEX - Publications

Liang Lu Xingxing Zhang Kyunghyun Cho Steve Renals

Deep neural networks have advanced the state-of-the-art in automatic speech recognition, when combined with hidden Markov models (HMMs).Recently there has been interest using systems based on recurrent (RNNs) to perform sequence modelling directly, without requirement of an HMM superstructure.In this paper, we study RNN encoder-decoder approach for large vocabulary end-toend whereby encoder transforms a acoustic vectors into feature representations, from which decoder recovers words.We...

10.21437/interspeech.2015-654 article EN Interspeech 2022 2015-09-06

Sequence Level Contrastive Learning for Text Summarization

OPENALEX - Publications

Shusheng Xu Xingxing Zhang Yi Wu Furu Wei

Contrastive learning models have achieved great success in unsupervised visual representation learning, which maximize the similarities between feature representations of different views same image, while minimize images. In text summarization, output summary is a shorter form input document and they similar meanings. this paper, we propose contrastive model for supervised abstractive where view document, its gold generated summaries as mean them during training. We improve over strong...

10.1609/aaai.v36i10.21409 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

A Comprehensive Survey of Continual Learning: Theory, Method and Application

OPENALEX - Publications

Liyuan Wang Xingxing Zhang Hang Su Jun Zhu

To cope with real-world dynamics, an intelligent system needs to incrementally acquire, update, accumulate, and exploit knowledge throughout its lifetime. This ability, known as continual learning, provides a foundation for AI systems develop themselves adaptively. In general sense, learning is explicitly limited by catastrophic forgetting, where new task usually results in dramatic performance degradation of the old tasks. Beyond this, increasingly numerous advances have emerged recent...

10.48550/arxiv.2302.00487 preprint EN other-oa arXiv (Cornell University) 2023-01-01

LongNet: Scaling Transformers to 1,000,000,000 Tokens

OPENALEX - Publications

Jiayu Ding Shuming Ma Li Dong Xingxing Zhang Shaohan Huang and 2 more

Scaling sequence length has become a critical demand in the era of large language models. However, existing methods struggle with either computational complexity or model expressivity, rendering maximum restricted. To address this issue, we introduce LongNet, Transformer variant that can scale to more than 1 billion tokens, without sacrificing performance on shorter sequences. Specifically, propose dilated attention, which expands attentive field exponentially as distance grows. LongNet...

10.48550/arxiv.2307.02486 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Top-down Tree Long Short-Term Memory Networks

OPENALEX - Publications

Xingxing Zhang Liang Lu Mirella Lapata

Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with more complex computational unit, have been successfully applied to variety sequence modeling tasks.In this paper we develop Tree (TREELSTM), model based on LSTM, which is designed predict tree rather than linear sequence.TREELSTM defines the probability sentence by estimating generation its dependency tree.At each time step, node generated representation subtree.We further enhance power TREELSTM explicitly...

10.18653/v1/n16-1035 preprint EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2016-01-01

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue

OPENALEX - Publications

Xiaoze Jiang Jing Yu Zengchang Qin Yingying Zhuang Xingxing Zhang and 2 more

Different from Visual Question Answering task that requires to answer only one question about an image, Dialogue involves multiple questions which cover a broad range of visual content could be related any objects, relationships or semantics. The key challenge in is thus learn more comprehensive and semantic-rich image representation may have adaptive attentions on the for variant questions. In this research, we propose novel model depict both semantic perspectives. Specifically, view helps...

10.1609/aaai.v34i07.6769 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Pre-training for Abstractive Document Summarization by Reinstating Source Text

OPENALEX - Publications

Yanyan Zou Xingxing Zhang Wei Lu Furu Wei Ming Zhou

ive document summarization is usually modeled as a sequence-to-sequence (SEQ2SEQ) learning problem. Unfortunately, training large SEQ2SEQ based models on limited supervised data challenging. This paper presents three pre-training (in shorthand, STEP) objectives which allow us to pre-train abstractive model unlabeled text. The main idea that, given an input text artificially constructed from document, pre-trained reinstate the original document. These include sentence reordering, next...

10.18653/v1/2020.emnlp-main.297 article EN cc-by 2020-01-01

Memory Replay with Data Compression for Continual Learning

OPENALEX - Publications

Liyuan Wang Xingxing Zhang Kuo Yang Longhui Yu Chongxuan Li and 5 more

Continual learning needs to overcome catastrophic forgetting of the past. Memory replay representative old training samples has been shown as an effective solution, and achieves state-of-the-art (SOTA) performance. However, existing work is mainly built on a small memory buffer containing few original data, which cannot fully characterize data distribution. In this work, we propose with compression (MRDC) reduce storage cost thus increase their amount that can be stored in buffer. Observing...

10.48550/arxiv.2202.06592 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Distribution-Induced Bidirectional Generative Adversarial Network for Graph Representation Learning

OPENALEX - Publications

Shuai Zheng Zhenfeng Zhu Xingxing Zhang Zhizhe Liu Jian Cheng and 1 more

Graph representation learning aims to encode all nodes of a graph into low-dimensional vectors that will serve as input many computer vision tasks. However, most existing algorithms ignore the existence inherent data distribution and even noises. This may significantly increase phenomenon over-fitting deteriorate testing accuracy. In this paper, we propose Distribution-induced Bidirectional Generative Adversarial Network (named DBGAN) for learning. Instead widely used Gaussian assumption,...

10.1109/cvpr42600.2020.00725 preprint EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Canonical Correlation Analysis With L 2,1-Norm for Multiview Data Representation

OPENALEX - Publications

Meixiang Xu Zhenfeng Zhu Xingxing Zhang Yao Zhao Xuelong Li

For many machine learning algorithms, their success heavily depends on data representation. In this paper, we present an ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2,1</sub> -norm constrained canonical correlation analysis (CCA) model, that is, L -CCA, toward discovering compact and discriminative representation for the associated with multiple views. To well exploit complementary coherent information across views, is employed to...

10.1109/tcyb.2019.2904753 article EN IEEE Transactions on Cybernetics 2019-04-04

Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers

OPENALEX - Publications

Shusheng Xu Xingxing Zhang Yi Wu Furu Wei Ming Zhou

Unsupervised extractive document summarization aims to select important sentences from a without using labeled summaries during training. Existing methods are mostly graph-based with as nodes and edge weights measured by sentence similarities. In this work, we find that transformer attentions can be used rank for unsupervised summarization. Specifically, first pre-train hierarchical model unlabeled documents only. Then propose method sentence-level self-attentions pre-training objectives....

10.18653/v1/2020.findings-emnlp.161 preprint EN cc-by 2020-01-01

BSTFNet: An Encrypted Malicious Traffic Classification Method Integrating Global Semantic and Spatiotemporal Features

OPENALEX - Publications

Hongwei Huang Xingxing Zhang Lu Ye Ze Li Shaohua Zhou

While encryption technology safeguards the security of network communications, malicious traffic also uses protocols to obscure its behavior.To address issues traditional machine learning methods relying on expert experience and insufficient representation capabilities existing deep for encrypted traffic, we propose an classification method that integrates global semantic features with local spatiotemporal features, called BERT-based Spatio-Temporal Features Network (BSTFNet).At packet-level...

10.32604/cmc.2024.047918 article EN Computers, materials & continua/Computers, materials & continua (Print) 2024-01-01

Maintaining Fairness in Logit-based Knowledge Distillation for Class-Incremental Learning

OPENALEX - Publications

Zijian Gao SeungHye Han Xingxing Zhang Kele Xu D. X. Zhou and 3 more

Logit-based knowledge distillation (KD) is commonly used to mitigate catastrophic forgetting in class-incremental learning (CIL) caused by data distribution shifts. However, the strict match of logit values between student and teacher models conflicts with cross-entropy (CE) loss objective new classes, leading significant recency bias (i.e. unfairness). To address this issue, we rethink overlooked limitations KD-based methods through empirical analysis. Inspired our findings, introduce a...

10.1609/aaai.v39i16.33842 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tuning

OPENALEX - Publications

Liyuan Wang Jingyi Xie Xingxing Zhang Hang Su Jun Zhu

The deployment of pre-trained models (PTMs) has greatly advanced the field continual learning (CL), enabling positive knowledge transfer and resilience to catastrophic forgetting. To sustain these advantages for sequentially arriving tasks, a promising direction involves keeping backbone frozen while employing parameter-efficient tuning (PET) techniques instruct representation learning. Despite popularity Prompt-based PET CL, its empirical design often leads sub-optimal performance in our...

10.1109/tpami.2025.3562534 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2025-01-01

Hierarchical Prototype Learning for Zero-Shot Recognition

OPENALEX - Publications

Xingxing Zhang Shupeng Gui Zhenfeng Zhu Yao Zhao Ji Liu

Zero-Shot Learning (ZSL) has received extensive attention and successes in recent years especially areas of fine-grained object recognition, retrieval, image captioning. Key to ZSL is transfer knowledge from the seen unseen classes via auxiliary semantic prototypes (e.g., word or attribute vectors). However, popularly learned projection functions previous works cannot generalize well due non-visual components included prototypes. Besides, incompleteness provided captured images less been...

10.1109/tmm.2019.2959433 article EN IEEE Transactions on Multimedia 2019-12-12

HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

OPENALEX - Publications

Xingxing Zhang Furu Wei Ming Zhou

Neural extractive summarization models usually employ a hierarchical encoder for document encoding and they are trained using sentence-level labels, which created heuristically rule-based methods. Training the with these \emph{inaccurate} labels is challenging. Inspired by recent work on pre-training transformer sentence encoders \cite{devlin:2018:arxiv}, we propose {\sc Hibert} (as shorthand {\bf HI}erachical B}idirectional E}ncoder R}epresentations from T}ransformers) method to pre-train...

10.48550/arxiv.1905.06566 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Sequence Level Contrastive Learning for Text Summarization

OPENALEX - Publications

Shusheng Xu Xingxing Zhang Yi Wu Furu Wei

Contrastive learning models have achieved great success in unsupervised visual representation learning, which maximize the similarities between feature representations of different views same image, while minimize images. In text summarization, output summary is a shorter form input document and they similar meanings. this paper, we propose contrastive model for supervised abstractive where view document, its gold generated summaries as mean them during training. We improve over strong...

10.48550/arxiv.2109.03481 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Just Noticeable Difference for Deep Machine Vision

OPENALEX - Publications

Jian Jin Xingxing Zhang Xin Fu Huan Zhang Weisi Lin and 2 more

As an important perceptual characteristic of the Human Visual System (HVS), Just Noticeable Difference (JND) has been studied for decades with image and video processing (e.g., visual signal compression). However, there is little exploration on existence JND Deep Machine Vision (DMV), although DMV made great strides in many machine vision tasks. In this paper, we take initial attempt, demonstrate that JND, termed as DMV-JND. We then propose a model classification task DMV. It discovered can...

10.1109/tcsvt.2021.3113572 article EN IEEE Transactions on Circuits and Systems for Video Technology 2021-09-16

Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality

OPENALEX - Publications

Liyuan Wang Jingyi Xie Xingxing Zhang Mingyi Huang Hang Su and 1 more

Prompt-based continual learning is an emerging direction in leveraging pre-trained knowledge for downstream learning, and has almost reached the performance pinnacle under supervised pre-training. However, our empirical research reveals that current strategies fall short of their full potential more realistic self-supervised pre-training, which essential handling vast quantities unlabeled data practice. This largely due to difficulty task-specific being incorporated into instructed...

10.48550/arxiv.2310.07234 preprint EN other-oa arXiv (Cornell University) 2023-01-01