- Topic Modeling
- Natural Language Processing Techniques
- Advanced Sensor and Control Systems
- Speech Recognition and Synthesis
- Advanced Algorithms and Applications
- Advanced Computational Techniques and Applications
- Music and Audio Processing
- Speech and Audio Processing
- Industrial Technology and Control Systems
- Embedded Systems and FPGA Design
- Industrial Automation and Control Systems
- Rough Sets and Fuzzy Logic
- Advanced Measurement and Detection Methods
- Wireless Sensor Networks and IoT
- Machine Learning in Bioinformatics
- Simulation and Modeling Applications
- Machine Learning in Materials Science
- Multimodal Machine Learning Applications
- Text and Document Classification Technologies
- Quality and Safety in Healthcare
- Human Pose and Action Recognition
- Distributed Control Multi-Agent Systems
- Advanced Decision-Making Techniques
- Speech and dialogue systems
- Protein Structure and Dynamics
Shanghai Jiao Tong University
2003-2024
Jiaozuo University
2024
Northeast Normal University
2024
Sun Yat-sen University
2016-2023
National University of Defense Technology
2010-2021
Hubei University of Technology
2021
Hiroshima University
2021
CRRC (China)
2021
Tianjin Chengjian University
2006-2019
Zhejiang Sci-Tech University
2018
With the progress of 3D human pose and shape estimation, state-of-the-art methods can either be robust to occlusions or obtain pixel-aligned accuracy in non-occlusion cases. However, they cannot robustness mesh-image alignment at same time. In this work, we present NIKI (Neural Inverse Kinematics with Invertible Neural Network), which models bidirectional errors improve accuracy. learn from both forward inverse processes invertible networks. process, model separates error plausible manifold...
QUALIFLEX (QUALItative FLEXible multiple criteria method) is one of useful outranking methods for analyzing decision problems because its flexibility with respect to the information cardinal and ordinal. This paper developed a probabilistic linguistic method p ossibility degree comparison dealing group making problems, in which evaluation alternatives are expressed by hesitant fuzzy sets standard weights partially known. Note that it more reasonable using term (PLTSs) represent whole...
End-to-end (E2E) automatic speech recognition (ASR) systems directly map acoustics to words using a unified model. Previous works mostly focus on E2E training single model which integrates acoustic and language into whole. Although benefits from sequence modeling simplified decoding pipelines, large amount of transcribed data is usually required, traditional modelling techniques cannot be utilized. In this paper, novel modular framework ASR proposed separately train neural models during...
Cognitive diagnosis has been developed for decades as an effective measurement tool to evaluate human cognitive status such ability level and knowledge mastery. It applied a wide range of fields including education, sport, psychological diagnosis, etc. By providing better awareness status, it can serve the basis personalized services well-designed medical treatment, teaching strategy vocational training. This paper aims provide survey current models with more attention on new developments...
Computer Aided Design (CAD) is indispensable across various industries. \emph{Text-based CAD editing}, which automates the modification of models based on textual instructions, holds great potential but remains underexplored. Existing methods primarily focus design variation generation or text-based generation, either lacking support for control neglecting existing as constraints. We introduce \emph{CAD-Editor}, first framework editing. To address challenge demanding triplet data with...
Deep Bidirectional Long Short-Term Memory (DBLSTM) with a Connectionist Temporal Classification (CTC) output layer has been established as one of the state-of-the-art solutions for handwriting recognition. It is well-known that DBLSTM trained by using CTC objective function will learn both local character image dependency modeling and long-range contextual implicit language modeling. In this paper, we study effects explicit model information DBLSTM-CTC based recognition comparing performance...
Large language models (LLMs), like ChatGPT, have shown some human-like cognitive abilities. For comparing these abilities of different models, several benchmarks (i.e. sets standard test questions) from fields (e.g., Literature, Biology and Psychology) are often adopted the results under traditional metrics such as accuracy, recall F1, reported. However, way for evaluating LLMs can be inefficient inaccurate science perspective. Inspired by Computerized Adaptive Testing (CAT) used in...
Large memory consumption of the neural network language models (NN LMs) prohibits their use in many resource-constrained scenarios. Hence, effective NN LM compression approaches that are independent structures great interest. However, previous usually achieve a high ratio at cost obvious performance loss. In this paper, two recently proposed quantization approaches, product (PQ) and soft binarization effectively combined to address issue. PQ decomposes word embedding matrices into Cartesian...
Although great progress has been made in automatic speech recognition (ASR), significant performance degradation still exists noisy environments. Based on our previous introduced very deep CNNs, this paper further integrates residual learning to evaluate convolutional network (VDCRN) conditions, which shows more powerful robustness. Then, cluster adaptive training (CAT) is developed the VDCRN reduce mismatch between and testing scenarios. Moreover, advanced future-vector assisted LSTM-RNN LM...
End-to-end (E2E) systems have played a more and important role in automatic speech recognition (ASR) achieved great performance. However, E2E recognize output word sequences directly with the input acoustic feature, which can only be trained on limited data. The extra text data is widely used to improve results of traditional artificial neural network-hidden Markov model (ANN-HMM) hybrid systems. involving standard ASR may break property during decoding. In this paper, novel modular system...
Melanoma is a highly metastatic and lethal skin tumor originating from melanocyte malignancy. Circulating cells (CTCs) are key endogenous biomarkers in melanoma metastasis. Melanin blood vessels exhibit substantial disparities their absorbance profiles at select wavelengths, characteristic that can be adeptly harnessed to differentiate the photoacoustic signals they generate. Photoacoustic flow cytometry (PAFC), which harnesses this principle, enables monitoring of CTC flowing vivo. However,...
The long short-term memory language model (LSTM LM) has been widely investigated in large vocabulary continuous speech recognition (LVCSR) task. Despite the excellent performance of LSTM LM, its usage resource-constrained environments, such as portable devices, is limited due to high consumption memory. Binarized proposed achieve significant reduction at cost degradation compression ratio. In this paper, we propose a soft binarization approach recover binarized LM. Experiments show that...
To improve the accuracy of automatic speech recognition, a two-pass decoding strategy is widely adopted. The first-pass model generates compact word lattices, which are utilized by second-pass to perform rescoring. Currently, most popular rescoring methods N-best and lattice with long short-term memory language models (LSTMLMs). However, these encounter problem limited search space or inconsistency between training evaluation. In this paper, we address problems an end-to-end for accurately...
Neural network language models have gained considerable popularity due to their promising performance. Distributed word embeddings are utilized represent semantic information. However, each is associated with a single vector in the embedding layer, disabling model from capturing meanings of polysemous words. In this work, we address problem by assigning multiple fine-grained sense layers. The proposed discriminates among different senses attention mechanism an unsupervised manner....