- Speech Recognition and Synthesis
- Topic Modeling
- Speech and Audio Processing
- Music and Audio Processing
- Biometric Identification and Security
- Natural Language Processing Techniques
- Face recognition and analysis
- Advanced Text Analysis Techniques
- Sentiment Analysis and Opinion Mining
- Digital Media Forensic Detection
- Advanced Neural Network Applications
- Stock Market Forecasting Methods
- Domain Adaptation and Few-Shot Learning
- Text and Document Classification Technologies
- Machine Learning and ELM
- Financial Markets and Investment Strategies
- Recommender Systems and Techniques
- Face and Expression Recognition
- Complex Systems and Time Series Analysis
- Imbalanced Data Classification Techniques
- Artificial Intelligence in Healthcare
- Generative Adversarial Networks and Image Synthesis
- Handwritten Text Recognition Techniques
- Image Enhancement Techniques
- Liver Disease Diagnosis and Treatment
Henan Institute of Science and Technology
2017-2024
Alibaba Group (United States)
2022-2024
Alibaba Group (Cayman Islands)
2024
Nanyang Technological University
2012-2020
Beijing University of Technology
2015-2020
Continental (Canada)
2020
National University of Singapore
2018
University of Genoa
2018
Tsinghua University
2016
Tianjin University
2012
Analyzing people’s opinions and sentiments towards certain aspects is an important task of natural language understanding. In this paper, we propose a novel solution to targeted aspect-based sentiment analysis, which tackles the challenges both analysis by exploiting commonsense knowledge. We augment long short-term memory (LSTM) network with hierarchical attention mechanism consisting target-level sentence-level attention. Commonsense knowledge sentiment-related concepts incorporated into...
Abstract Current liver fibrosis scoring by computer-assisted image analytics is not fully automated as it requires manual preprocessing (segmentation and feature extraction) typically based on domain knowledge in pathology. Deep learning-based algorithms can potentially classify these images without the need for through learning from a large dataset of images. We investigated performance classification models built using deep algorithm pre-trained multiple sources to score compared them...
Our previously proposed MossFormer has achieved promising performance in monaural speech separation. However, it predominantly adopts a self-attention-based module, which tends to emphasize longer-range, coarser-scale dependencies, with deficiency effectively modelling finer-scale recurrent patterns. In this paper, we introduce novel hybrid model that provides the capabilities both long-range, coarse-scale dependencies and fine-scale patterns by integrating module into framework. Instead of...
With the increase in applications of face verification, increasing attention has been paid to their accuracy and security. To ensure both safety these systems, this paper proposes an encrypted face-verification system. In paper, features are extracted using deep neural networks then with Paillier algorithm saved a data set. The framework whole system involves three parties: client, server, verification server. server saves user ID, performs client is responsible for collecting requester's...
Recently, unified speech-text models, such as SpeechGPT, VioLA, and AudioPaLM, have achieved remarkable performance on various speech tasks. These models discretize signals into tokens (speech discretization) use a shared vocabulary for both text tokens. Then they train single decoder-only Transformer mixture of However, these rely the Loss Masking strategy ASR task, which ignores dependency among In this paper, we propose to model in an autoregressive way, similar text. We find that...
Recently, the application of diffusion probabilistic models has advanced speech enhancement through generative approaches. However, existing diffusion-based methods have focused on generation process in high-dimensional waveform or spectral domains, leading to increased complexity and slower inference speeds. Additionally, these primarily modelled clean distributions, with limited exploration noise thereby constraining discriminative capability for enhancement. To address issues, we propose...
The names of software artifacts, e.g., method names, are important for understanding and maintenance, as good can help developers easily understand others' code. However, the existing naming guidelines difficult developers, especially novices, to come up with meaningful, concise compact variables, methods, classes files. With popularity open source, an enormous amount project source code be accessed, exhaustiveness instability manually methods could now relieved by automatically learning a...
Due to the existence of unfavorable factors such as turbid water quality and target occlusion, it is difficult obtain valid data features. repeated calculation similar data, real-time performance algorithm poor. In view above problems, this paper proposes a multi-AUV collaborative recognition method based on transfer-reinforcement learning. The features information which collected by are fused wavelet transformation affine invariance. similarity calculated Mahalanobis distance learning model...
In this paper, we focus on named entity boundary detection, which is to detect the start and end boundaries of an mention in text, without predicting its type. The detected entities are input linking or fine-grained typing systems for semantic enrichment. We propose BdryBot, a recurrent neural network encoder-decoder framework with pointer from given sentence. encoder considers both character-level representations word-level embeddings represent words. way, BdryBot does not require any...
Existing self-supervised pre-trained speech models have offered an effective way to leverage massive unannotated corpora build good automatic recognition (ASR). However, many current are trained on a clean corpus from single source, which tends do poorly when noise is present during testing. Nonetheless, it crucial overcome the adverse influence of for real-world applications. In this work, we propose novel training framework, called deHuBERT, reduction encoding inspired by H. Barlow's...
Over the last twenty years, researchers and practitioners have attempted in many ways to effectively predict market trends. Till date, however, no satisfactory solution has been found. Many approaches applied trends, from technical analysis fundamental passing through sentiment analysis. A promising research direction is exploit indicators together with sentiments extracted social media for predicting directional movements. In this paper, we propose a new approach that leverages particular,...
This paper leverages heterogeneous auxiliary information to address the data sparsity problem of recommender systems. We propose a model that learns shared feature space from data, such as item descriptions, product tags and online purchase history, obtain better predictions. Our consists autoencoders, not only for numerical categorical but also sequential which enables capturing user tastes, characteristics recent dynamics preference. learn autoencoder architecture each source independently...
Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlapping chunks its intra- and inter-blocks that separately model intra-chunk local features inter-chunk global relationships. However, it has been found inter-blocks, comprise half dual-path model's parameters, contribute minimally to performance. Thus, we propose the Single-Path Global Modulation (SPGM) block replace inter-blocks. SPGM named after structure consisting of...
Most of the existing neural-based models for keyword spotting (KWS) in smart devices require thousands training samples to learn a decent audio representation. However, with rising demand become more person-alized, KWS need adapt quickly smaller user samples. To tackle this challenge, we propose contrastive speech mixup (CosMix) learning algorithm low-resource KWS. CosMix introduces an auxiliary loss augmentation technique maximize relative similarity between original pre-mixed and augmented...