- Natural Language Processing Techniques
- Topic Modeling
- Domain Adaptation and Few-Shot Learning
- Multimodal Machine Learning Applications
- Advanced Text Analysis Techniques
- Adversarial Robustness in Machine Learning
- Human Pose and Action Recognition
- Advanced Image and Video Retrieval Techniques
- Text Readability and Simplification
- Video Surveillance and Tracking Methods
- COVID-19 diagnosis using AI
- Machine Learning and ELM
- Anomaly Detection Techniques and Applications
- Face and Expression Recognition
- Advanced Neural Network Applications
- Machine Learning and Data Classification
- Advanced Graph Neural Networks
- Speech Recognition and Synthesis
- Advanced Vision and Imaging
- Advanced Image Processing Techniques
- Image Retrieval and Classification Techniques
- Digital Media Forensic Detection
- Data Stream Mining Techniques
- Human Mobility and Location-Based Analysis
- Image Enhancement Techniques
Tsinghua University
2021-2025
Sichuan University of Science and Engineering
2024
Guizhou University
2024
American Jewish Committee
2023
IT University of Copenhagen
2023
Tokyo Institute of Technology
2023
Administration for Community Living
2023
Microsoft Research Asia (China)
2019-2023
Nanjing University of Aeronautics and Astronautics
2022
Civil Aviation University of China
2022
Neural extractive summarization models usually employ a hierarchical encoder for document encoding and they are trained using sentence-level labels, which created heuristically rule-based methods. Training the with these inaccurate labels is challenging. Inspired by recent work on pre-training transformer sentence encoders (Devlin et al., 2018), we propose Hibert (as shorthand HIerachical Bidirectional Encoder Representations from Transformers) method to pre-train it unlabeled data. We apply...
We propose a model for Chinese poem generation based on recurrent neural networks which we argue is ideally suited to capturing poetic content and form.Our generator jointly performs selection ("what say") surface realization ("how by learning representations of individual characters, their combinations into one or more lines as well how these mutually reinforce constrain each other.Poem are generated incrementally taking account the entire history what has been so far rather than limited...
Sentence simplification aims to make sentences easier read and understand. Most recent approaches draw on insights from machine translation learn rewrites monolingual corpora of complex simple sentences. We address the problem with an encoder-decoder model coupled a deep reinforcement learning framework. Our model, which we call DRESS (as shorthand for Deep REinforcement Simplification), explores space possible simplifications while optimize reward function that encourages outputs are...
To cope with real-world dynamics, an intelligent system needs to incrementally acquire, update, accumulate, and exploit knowledge throughout its lifetime. This ability, known as continual learning, provides a foundation for AI systems develop themselves adaptively. In general sense, learning is explicitly limited by catastrophic forgetting, where new task usually results in dramatic performance drop of the old tasks. Beyond this, increasingly numerous advances have emerged recent years that...
Recently, there has been an increasing interest in end-to-end speech recognition using neural networks, with no reliance on hidden Markov models (HMMs) for sequence modelling as the standard hybrid framework. The recurrent network (RNN) encoderdecoder is such a model, performing to mapping without any predefined alignment. This model first transforms input into fixed length vector representation, from which decoder recovers output sequence. In this paper, we extend our previous work large...
Conventional graph-based dependency parsers guarantee a tree structure both during training and inference. Instead, we formalize parsing as the problem of independently selecting head each word in sentence. Our model which call DENSE (as shorthand for Dependency Neural Selection) produces distribution over possible heads using features obtained from bidirectional recurrent neural network. Without enforcing structural constraints training, DeNSe generates (at inference time) trees...
Deep neural networks have advanced the state-of-the-art in automatic speech recognition, when combined with hidden Markov models (HMMs).Recently there has been interest using systems based on recurrent (RNNs) to perform sequence modelling directly, without requirement of an HMM superstructure.In this paper, we study RNN encoder-decoder approach for large vocabulary end-toend whereby encoder transforms a acoustic vectors into feature representations, from which decoder recovers words.We...
Contrastive learning models have achieved great success in unsupervised visual representation learning, which maximize the similarities between feature representations of different views same image, while minimize images. In text summarization, output summary is a shorter form input document and they similar meanings. this paper, we propose contrastive model for supervised abstractive where view document, its gold generated summaries as mean them during training. We improve over strong...
To cope with real-world dynamics, an intelligent system needs to incrementally acquire, update, accumulate, and exploit knowledge throughout its lifetime. This ability, known as continual learning, provides a foundation for AI systems develop themselves adaptively. In general sense, learning is explicitly limited by catastrophic forgetting, where new task usually results in dramatic performance degradation of the old tasks. Beyond this, increasingly numerous advances have emerged recent...
Scaling sequence length has become a critical demand in the era of large language models. However, existing methods struggle with either computational complexity or model expressivity, rendering maximum restricted. To address this issue, we introduce LongNet, Transformer variant that can scale to more than 1 billion tokens, without sacrificing performance on shorter sequences. Specifically, propose dilated attention, which expands attentive field exponentially as distance grows. LongNet...
Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with more complex computational unit, have been successfully applied to variety sequence modeling tasks.In this paper we develop Tree (TREELSTM), model based on LSTM, which is designed predict tree rather than linear sequence.TREELSTM defines the probability sentence by estimating generation its dependency tree.At each time step, node generated representation subtree.We further enhance power TREELSTM explicitly...
Different from Visual Question Answering task that requires to answer only one question about an image, Dialogue involves multiple questions which cover a broad range of visual content could be related any objects, relationships or semantics. The key challenge in is thus learn more comprehensive and semantic-rich image representation may have adaptive attentions on the for variant questions. In this research, we propose novel model depict both semantic perspectives. Specifically, view helps...
ive document summarization is usually modeled as a sequence-to-sequence (SEQ2SEQ) learning problem. Unfortunately, training large SEQ2SEQ based models on limited supervised data challenging. This paper presents three pre-training (in shorthand, STEP) objectives which allow us to pre-train abstractive model unlabeled text. The main idea that, given an input text artificially constructed from document, pre-trained reinstate the original document. These include sentence reordering, next...
Continual learning needs to overcome catastrophic forgetting of the past. Memory replay representative old training samples has been shown as an effective solution, and achieves state-of-the-art (SOTA) performance. However, existing work is mainly built on a small memory buffer containing few original data, which cannot fully characterize data distribution. In this work, we propose with compression (MRDC) reduce storage cost thus increase their amount that can be stored in buffer. Observing...
Graph representation learning aims to encode all nodes of a graph into low-dimensional vectors that will serve as input many computer vision tasks. However, most existing algorithms ignore the existence inherent data distribution and even noises. This may significantly increase phenomenon over-fitting deteriorate testing accuracy. In this paper, we propose Distribution-induced Bidirectional Generative Adversarial Network (named DBGAN) for learning. Instead widely used Gaussian assumption,...
For many machine learning algorithms, their success heavily depends on data representation. In this paper, we present an ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2,1</sub> -norm constrained canonical correlation analysis (CCA) model, that is, L -CCA, toward discovering compact and discriminative representation for the associated with multiple views. To well exploit complementary coherent information across views, is employed to...
Unsupervised extractive document summarization aims to select important sentences from a without using labeled summaries during training. Existing methods are mostly graph-based with as nodes and edge weights measured by sentence similarities. In this work, we find that transformer attentions can be used rank for unsupervised summarization. Specifically, first pre-train hierarchical model unlabeled documents only. Then propose method sentence-level self-attentions pre-training objectives....
While encryption technology safeguards the security of network communications, malicious traffic also uses protocols to obscure its behavior.To address issues traditional machine learning methods relying on expert experience and insufficient representation capabilities existing deep for encrypted traffic, we propose an classification method that integrates global semantic features with local spatiotemporal features, called BERT-based Spatio-Temporal Features Network (BSTFNet).At packet-level...
Logit-based knowledge distillation (KD) is commonly used to mitigate catastrophic forgetting in class-incremental learning (CIL) caused by data distribution shifts. However, the strict match of logit values between student and teacher models conflicts with cross-entropy (CE) loss objective new classes, leading significant recency bias (i.e. unfairness). To address this issue, we rethink overlooked limitations KD-based methods through empirical analysis. Inspired our findings, introduce a...
The deployment of pre-trained models (PTMs) has greatly advanced the field continual learning (CL), enabling positive knowledge transfer and resilience to catastrophic forgetting. To sustain these advantages for sequentially arriving tasks, a promising direction involves keeping backbone frozen while employing parameter-efficient tuning (PET) techniques instruct representation learning. Despite popularity Prompt-based PET CL, its empirical design often leads sub-optimal performance in our...
Zero-Shot Learning (ZSL) has received extensive attention and successes in recent years especially areas of fine-grained object recognition, retrieval, image captioning. Key to ZSL is transfer knowledge from the seen unseen classes via auxiliary semantic prototypes (e.g., word or attribute vectors). However, popularly learned projection functions previous works cannot generalize well due non-visual components included prototypes. Besides, incompleteness provided captured images less been...
Neural extractive summarization models usually employ a hierarchical encoder for document encoding and they are trained using sentence-level labels, which created heuristically rule-based methods. Training the with these \emph{inaccurate} labels is challenging. Inspired by recent work on pre-training transformer sentence encoders \cite{devlin:2018:arxiv}, we propose {\sc Hibert} (as shorthand {\bf HI}erachical B}idirectional E}ncoder R}epresentations from T}ransformers) method to pre-train...
Contrastive learning models have achieved great success in unsupervised visual representation learning, which maximize the similarities between feature representations of different views same image, while minimize images. In text summarization, output summary is a shorter form input document and they similar meanings. this paper, we propose contrastive model for supervised abstractive where view document, its gold generated summaries as mean them during training. We improve over strong...
As an important perceptual characteristic of the Human Visual System (HVS), Just Noticeable Difference (JND) has been studied for decades with image and video processing (e.g., visual signal compression). However, there is little exploration on existence JND Deep Machine Vision (DMV), although DMV made great strides in many machine vision tasks. In this paper, we take initial attempt, demonstrate that JND, termed as DMV-JND. We then propose a model classification task DMV. It discovered can...
Prompt-based continual learning is an emerging direction in leveraging pre-trained knowledge for downstream learning, and has almost reached the performance pinnacle under supervised pre-training. However, our empirical research reveals that current strategies fall short of their full potential more realistic self-supervised pre-training, which essential handling vast quantities unlabeled data practice. This largely due to difficulty task-specific being incorporated into instructed...