Xingxing Zhang

ORCID: 0000-0003-4012-3796
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Domain Adaptation and Few-Shot Learning
  • Multimodal Machine Learning Applications
  • Advanced Text Analysis Techniques
  • Adversarial Robustness in Machine Learning
  • Human Pose and Action Recognition
  • Advanced Image and Video Retrieval Techniques
  • Text Readability and Simplification
  • Video Surveillance and Tracking Methods
  • COVID-19 diagnosis using AI
  • Machine Learning and ELM
  • Anomaly Detection Techniques and Applications
  • Face and Expression Recognition
  • Advanced Neural Network Applications
  • Machine Learning and Data Classification
  • Advanced Graph Neural Networks
  • Speech Recognition and Synthesis
  • Advanced Vision and Imaging
  • Advanced Image Processing Techniques
  • Image Retrieval and Classification Techniques
  • Digital Media Forensic Detection
  • Data Stream Mining Techniques
  • Human Mobility and Location-Based Analysis
  • Image Enhancement Techniques

Tsinghua University
2021-2025

Sichuan University of Science and Engineering
2024

Guizhou University
2024

American Jewish Committee
2023

IT University of Copenhagen
2023

Tokyo Institute of Technology
2023

Administration for Community Living
2023

Microsoft Research Asia (China)
2019-2023

Nanjing University of Aeronautics and Astronautics
2022

Civil Aviation University of China
2022

Neural extractive summarization models usually employ a hierarchical encoder for document encoding and they are trained using sentence-level labels, which created heuristically rule-based methods. Training the with these inaccurate labels is challenging. Inspired by recent work on pre-training transformer sentence encoders (Devlin et al., 2018), we propose Hibert (as shorthand HIerachical Bidirectional Encoder Representations from Transformers) method to pre-train it unlabeled data. We apply...

10.18653/v1/p19-1499 preprint EN cc-by 2019-01-01

We propose a model for Chinese poem generation based on recurrent neural networks which we argue is ideally suited to capturing poetic content and form.Our generator jointly performs selection ("what say") surface realization ("how by learning representations of individual characters, their combinations into one or more lines as well how these mutually reinforce constrain each other.Poem are generated incrementally taking account the entire history what has been so far rather than limited...

10.3115/v1/d14-1074 article EN cc-by 2014-01-01

Sentence simplification aims to make sentences easier read and understand. Most recent approaches draw on insights from machine translation learn rewrites monolingual corpora of complex simple sentences. We address the problem with an encoder-decoder model coupled a deep reinforcement learning framework. Our model, which we call DRESS (as shorthand for Deep REinforcement Simplification), explores space possible simplifications while optimize reward function that encourages outputs are...

10.18653/v1/d17-1062 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2017-01-01

To cope with real-world dynamics, an intelligent system needs to incrementally acquire, update, accumulate, and exploit knowledge throughout its lifetime. This ability, known as continual learning, provides a foundation for AI systems develop themselves adaptively. In general sense, learning is explicitly limited by catastrophic forgetting, where new task usually results in dramatic performance drop of the old tasks. Beyond this, increasingly numerous advances have emerged recent years that...

10.1109/tpami.2024.3367329 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-02-26

Recently, there has been an increasing interest in end-to-end speech recognition using neural networks, with no reliance on hidden Markov models (HMMs) for sequence modelling as the standard hybrid framework. The recurrent network (RNN) encoderdecoder is such a model, performing to mapping without any predefined alignment. This model first transforms input into fixed length vector representation, from which decoder recovers output sequence. In this paper, we extend our previous work large...

10.1109/icassp.2016.7472641 article EN 2016-03-01

Conventional graph-based dependency parsers guarantee a tree structure both during training and inference. Instead, we formalize parsing as the problem of independently selecting head each word in sentence. Our model which call DENSE (as shorthand for Dependency Neural Selection) produces distribution over possible heads using features obtained from bidirectional recurrent neural network. Without enforcing structural constraints training, DeNSe generates (at inference time) trees...

10.18653/v1/e17-1063 article EN cc-by 2017-01-01

Deep neural networks have advanced the state-of-the-art in automatic speech recognition, when combined with hidden Markov models (HMMs).Recently there has been interest using systems based on recurrent (RNNs) to perform sequence modelling directly, without requirement of an HMM superstructure.In this paper, we study RNN encoder-decoder approach for large vocabulary end-toend whereby encoder transforms a acoustic vectors into feature representations, from which decoder recovers words.We...

10.21437/interspeech.2015-654 article EN Interspeech 2022 2015-09-06

Contrastive learning models have achieved great success in unsupervised visual representation learning, which maximize the similarities between feature representations of different views same image, while minimize images. In text summarization, output summary is a shorter form input document and they similar meanings. this paper, we propose contrastive model for supervised abstractive where view document, its gold generated summaries as mean them during training. We improve over strong...

10.1609/aaai.v36i10.21409 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

To cope with real-world dynamics, an intelligent system needs to incrementally acquire, update, accumulate, and exploit knowledge throughout its lifetime. This ability, known as continual learning, provides a foundation for AI systems develop themselves adaptively. In general sense, learning is explicitly limited by catastrophic forgetting, where new task usually results in dramatic performance degradation of the old tasks. Beyond this, increasingly numerous advances have emerged recent...

10.48550/arxiv.2302.00487 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Scaling sequence length has become a critical demand in the era of large language models. However, existing methods struggle with either computational complexity or model expressivity, rendering maximum restricted. To address this issue, we introduce LongNet, Transformer variant that can scale to more than 1 billion tokens, without sacrificing performance on shorter sequences. Specifically, propose dilated attention, which expands attentive field exponentially as distance grows. LongNet...

10.48550/arxiv.2307.02486 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with more complex computational unit, have been successfully applied to variety sequence modeling tasks.In this paper we develop Tree (TREELSTM), model based on LSTM, which is designed predict tree rather than linear sequence.TREELSTM defines the probability sentence by estimating generation its dependency tree.At each time step, node generated representation subtree.We further enhance power TREELSTM explicitly...

10.18653/v1/n16-1035 preprint EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2016-01-01

Different from Visual Question Answering task that requires to answer only one question about an image, Dialogue involves multiple questions which cover a broad range of visual content could be related any objects, relationships or semantics. The key challenge in is thus learn more comprehensive and semantic-rich image representation may have adaptive attentions on the for variant questions. In this research, we propose novel model depict both semantic perspectives. Specifically, view helps...

10.1609/aaai.v34i07.6769 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

ive document summarization is usually modeled as a sequence-to-sequence (SEQ2SEQ) learning problem. Unfortunately, training large SEQ2SEQ based models on limited supervised data challenging. This paper presents three pre-training (in shorthand, STEP) objectives which allow us to pre-train abstractive model unlabeled text. The main idea that, given an input text artificially constructed from document, pre-trained reinstate the original document. These include sentence reordering, next...

10.18653/v1/2020.emnlp-main.297 article EN cc-by 2020-01-01

Continual learning needs to overcome catastrophic forgetting of the past. Memory replay representative old training samples has been shown as an effective solution, and achieves state-of-the-art (SOTA) performance. However, existing work is mainly built on a small memory buffer containing few original data, which cannot fully characterize data distribution. In this work, we propose with compression (MRDC) reduce storage cost thus increase their amount that can be stored in buffer. Observing...

10.48550/arxiv.2202.06592 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Graph representation learning aims to encode all nodes of a graph into low-dimensional vectors that will serve as input many computer vision tasks. However, most existing algorithms ignore the existence inherent data distribution and even noises. This may significantly increase phenomenon over-fitting deteriorate testing accuracy. In this paper, we propose Distribution-induced Bidirectional Generative Adversarial Network (named DBGAN) for learning. Instead widely used Gaussian assumption,...

10.1109/cvpr42600.2020.00725 preprint EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

For many machine learning algorithms, their success heavily depends on data representation. In this paper, we present an ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2,1</sub> -norm constrained canonical correlation analysis (CCA) model, that is, L -CCA, toward discovering compact and discriminative representation for the associated with multiple views. To well exploit complementary coherent information across views, is employed to...

10.1109/tcyb.2019.2904753 article EN IEEE Transactions on Cybernetics 2019-04-04

Unsupervised extractive document summarization aims to select important sentences from a without using labeled summaries during training. Existing methods are mostly graph-based with as nodes and edge weights measured by sentence similarities. In this work, we find that transformer attentions can be used rank for unsupervised summarization. Specifically, first pre-train hierarchical model unlabeled documents only. Then propose method sentence-level self-attentions pre-training objectives....

10.18653/v1/2020.findings-emnlp.161 preprint EN cc-by 2020-01-01

While encryption technology safeguards the security of network communications, malicious traffic also uses protocols to obscure its behavior.To address issues traditional machine learning methods relying on expert experience and insufficient representation capabilities existing deep for encrypted traffic, we propose an classification method that integrates global semantic features with local spatiotemporal features, called BERT-based Spatio-Temporal Features Network (BSTFNet).At packet-level...

10.32604/cmc.2024.047918 article EN Computers, materials & continua/Computers, materials & continua (Print) 2024-01-01

Logit-based knowledge distillation (KD) is commonly used to mitigate catastrophic forgetting in class-incremental learning (CIL) caused by data distribution shifts. However, the strict match of logit values between student and teacher models conflicts with cross-entropy (CE) loss objective new classes, leading significant recency bias (i.e. unfairness). To address this issue, we rethink overlooked limitations KD-based methods through empirical analysis. Inspired our findings, introduce a...

10.1609/aaai.v39i16.33842 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

The deployment of pre-trained models (PTMs) has greatly advanced the field continual learning (CL), enabling positive knowledge transfer and resilience to catastrophic forgetting. To sustain these advantages for sequentially arriving tasks, a promising direction involves keeping backbone frozen while employing parameter-efficient tuning (PET) techniques instruct representation learning. Despite popularity Prompt-based PET CL, its empirical design often leads sub-optimal performance in our...

10.1109/tpami.2025.3562534 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2025-01-01

Zero-Shot Learning (ZSL) has received extensive attention and successes in recent years especially areas of fine-grained object recognition, retrieval, image captioning. Key to ZSL is transfer knowledge from the seen unseen classes via auxiliary semantic prototypes (e.g., word or attribute vectors). However, popularly learned projection functions previous works cannot generalize well due non-visual components included prototypes. Besides, incompleteness provided captured images less been...

10.1109/tmm.2019.2959433 article EN IEEE Transactions on Multimedia 2019-12-12

Neural extractive summarization models usually employ a hierarchical encoder for document encoding and they are trained using sentence-level labels, which created heuristically rule-based methods. Training the with these \emph{inaccurate} labels is challenging. Inspired by recent work on pre-training transformer sentence encoders \cite{devlin:2018:arxiv}, we propose {\sc Hibert} (as shorthand {\bf HI}erachical B}idirectional E}ncoder R}epresentations from T}ransformers) method to pre-train...

10.48550/arxiv.1905.06566 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Contrastive learning models have achieved great success in unsupervised visual representation learning, which maximize the similarities between feature representations of different views same image, while minimize images. In text summarization, output summary is a shorter form input document and they similar meanings. this paper, we propose contrastive model for supervised abstractive where view document, its gold generated summaries as mean them during training. We improve over strong...

10.48550/arxiv.2109.03481 preprint EN other-oa arXiv (Cornell University) 2021-01-01

As an important perceptual characteristic of the Human Visual System (HVS), Just Noticeable Difference (JND) has been studied for decades with image and video processing (e.g., visual signal compression). However, there is little exploration on existence JND Deep Machine Vision (DMV), although DMV made great strides in many machine vision tasks. In this paper, we take initial attempt, demonstrate that JND, termed as DMV-JND. We then propose a model classification task DMV. It discovered can...

10.1109/tcsvt.2021.3113572 article EN IEEE Transactions on Circuits and Systems for Video Technology 2021-09-16

Prompt-based continual learning is an emerging direction in leveraging pre-trained knowledge for downstream learning, and has almost reached the performance pinnacle under supervised pre-training. However, our empirical research reveals that current strategies fall short of their full potential more realistic self-supervised pre-training, which essential handling vast quantities unlabeled data practice. This largely due to difficulty task-specific being incorporated into instructed...

10.48550/arxiv.2310.07234 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...