- Speech Recognition and Synthesis
- Speech and Audio Processing
- Digital Media Forensic Detection
- Video Surveillance and Tracking Methods
- Video Analysis and Summarization
- Advanced Image and Video Retrieval Techniques
- Advanced Steganography and Watermarking Techniques
- Advanced Image Processing Techniques
- Natural Language Processing Techniques
- Advanced Vision and Imaging
- Generative Adversarial Networks and Image Synthesis
- Neural Networks and Applications
- Advanced Neural Network Applications
- Advanced Data Compression Techniques
- Topic Modeling
- Recommender Systems and Techniques
- Robotics and Sensor-Based Localization
- Image Processing Techniques and Applications
- Human Pose and Action Recognition
- Advanced Image Fusion Techniques
- Image Enhancement Techniques
- Sparse and Compressive Sensing Techniques
- Video Coding and Compression Technologies
- E-commerce and Technology Innovations
- Error Correcting Code Techniques
Henan Polytechnic University
2013-2025
École Polytechnique Fédérale de Lausanne
2024
Alibaba Group (United States)
2024
Mongolian University of Science and Technology
2024
Alibaba Group (China)
2024
Beijing University of Chemical Technology
2023-2024
Yanshan University
2023
Fujian Normal University
2023
Southeast University
2022
Hunan Institute of Engineering
2022
In recent years, all-neural, end-to-end (E2E) ASR systems gained rapid interest in the speech recognition community.They convert input to text units a single trainable Neural Network model.In ASR, many utterances contain rich named entities.Such entities may be user or location specific and they are not seen during training.A model makes it inflexible utilize dynamic contextual information inference.In this paper, we propose train context aware E2E allow beam search traverse into FST...
Motivated by large margin classifiers in machine learning, we propose a novel method to estimate continuous density hidden Markov model (CDHMM) speech recognition according the principle of maximizing minimum multi-class separation margin. The approach is named HMM. First, show that this type HMM estimation problem can be formulated as standard constrained minimax optimization problem. Second, an iterative localized perform for one at time guarantee optimal value objective function always...
In this paper, we study how to incorporate training errors in large margin estimation (LME) under semi-definite programming (SDP) framework. Like soft-margin SVM, propose optimize a new objective function which linearly combines the minimum among positive tokens and an average error of all negative tokens. The method is named as soft-LME. It shown soft-LME problem can still be converted into SDP if properly define based on their discriminative functions. Some preliminary results TIDIGITS...
In this paper, we propose a new discriminative training method for estimating CDHMM (continuous density hidden Markov model) in speech recognition, based on the principle of maximizing minimum relative multi-class separation margin. We show that criterion can be formulated as standard constrained minimax optimization problem. Then problem solved by GPD (generalized probabilistic descent) algorithm. Experimental results E-set and Alphabet tasks (ISOLET database) showed achieve significant (up...
In this paper, we propose to use a new optimization method, i.e., semidefinite programming (SDP), solve the large-margin estimation (LME) problem of continuous-density hidden Markov model (CDHMM) in speech recognition. First, introduce constraint for LME guarantee boundedness margin CDHMM. Second, show that subject can be formulated as an SDP under some relaxation conditions. Therefore, it solved using many efficient algorithms specially designed SDP. The LME/SDP method has been evaluated on...
Discriminative learning methods have achieved many successes in speech and language processing during the past decades. of generative models is a typical optimization problem, where efficient play critical role. For widely used statistical models, discriminative normally leads to nonconvex problems. In this article we three representative examples showcase how use proper convex relaxation method convert HMMs MMMs into standard problem so that it can be solved effectively efficiently even for...
In this paper, we propose a new optimization method, i.e., constrained joint to solve the minimax problem in large margin estimation (LME) of continuous density hidden Markov model (CDHMM) for speech recognition. First, mathematically analyze definition and introduce some theoretically-sound constraints into guarantee boundedness LME. Moreover, by using penalized gradient descent algorithm, where original objective function, minimum margin, is approximated differentiable function are cast as...
One of the most popular sports in Asia is badminton. has talented players The conventional badminton teaching are totally reliant on coach and fitness instructor. But today’s technologically advanced world, educational system caught up with technology intelligence. By using smart technology, gamers may train their own. These technologies range from mobile scheduling apps to apps. In this study, a unique intelligent K-Nearest Neighbor algorithm used, outcomes assessed. Additionally, all...
Vehicle view object detection technology is the key to environment perception modules of autonomous vehicles, which crucial for driving safety. In characteristics complex scenes, such as dim light, occlusion, and long distance, an improved YOLOv4-based vehicle model, VV-YOLO, proposed in this paper. The VV-YOLO model adopts implementation mode based on anchor frames. frame clustering, K-means++ algorithm used reduce possibility instability clustering results caused by random selection a...
In the era of big data, explosive growth cultural tourism information should provide personalized and privacy requirements for people's travel. However, existing travel recommendation algorithms have some obvious challenges, such as weak personalization, significant cold-start problem, poor user protection, etc. We propose a intelligent model method, which constructs new collaborative filtering algorithm based on characteristics portrait historical behaviors, designs an user-attraction data...
The registration of optical and SAR images has always been a challenging task due to the different imaging mechanisms corresponding sensors. To mitigate this difference, paper proposes algorithm based on pseudo-SAR image generation strategy an improved deep learning-based network. method consists two stages: In section, Restormer network is used convert into images. An L2 loss function adopted in network, fluctuates less at optimal point, making it easier for model reach fitting state. part,...
Recently, multi-modal content generation has attracted lots of attention from researchers by investigating the utilization visual instruction tuning based on large language models (LLMs). To enhance performance and generalization ability such LLMs, practice distilling knowledge pretrained (a.k.a. teachers) to more compact LLMs (students) gained considerable interest. However, prevailing paradigm instructiontuning in distillation is resource-intensive unidirectional, neglecting potential for...
Periodic narrowband signals and white noise are the main interferences in online detection localization of cable partial discharge (PD), however, existing research has always focused on suppression only, which is not line with actual scene. A novel de-noising method for effectively extracting random PD pulse from complex strong proposed this paper applied to localization. Firstly, an improved adaptive variational mode decomposition (AVMD) used decompose periodic interference, noise, signal...
Removing the haze in an image is a huge challenge due to difficulty of accurate hazy modeling. Although atmospheric scattering model (ASM) widely used describe formation images, it hard deal with uneven image, once ASM restricted assumption that atmosphere distributed homogeneously. This paper analyzes imaging mechanism, then proposes image-to-image architecture handle dehazing, which heterogeneous twin network (HT-Net) two parallel sub-networks are constructed establish high dimensional...
In recent years, all-neural, end-to-end (E2E) ASR systems gained rapid interest in the speech recognition community. They convert input to text units a single trainable Neural Network model. ASR, many utterances contain rich named entities. Such entities may be user or location specific and they are not seen during training. A model makes it inflexible utilize dynamic contextual information inference. this paper, we propose train context aware E2E allow beam search traverse into FST We also...
Cross-modal hashing has attracted extensive attention due to the small data storage space and favorable retrieval efficiency. Matrix factorization-based method is an important kind of cross-modal method. Most existing matrix factorization methods map heterogeneous into a low-dimensional common Hamming space, then adopt relaxation quantification strategy obtain approximate hash-coded solution. However, there exist uncontrollable quantization error in this process, which may affect...