- Music and Audio Processing
- Speech Recognition and Synthesis
- Speech and Audio Processing
- Radiomics and Machine Learning in Medical Imaging
- Distributed and Parallel Computing Systems
- Video Analysis and Summarization
- Tactile and Sensory Interactions
- AI in cancer detection
- Cloud Computing and Resource Management
- Video Surveillance and Tracking Methods
- Natural Language Processing Techniques
- Hand Gesture Recognition Systems
- Medical Imaging and Analysis
- Advanced Neural Network Applications
- Digital Games and Media
- Peer-to-Peer Network Technologies
- Retinal Imaging and Analysis
- Advanced Database Systems and Queries
- Data Management and Algorithms
- Artificial Intelligence in Games
- Seismic Waves and Analysis
- Brain Tumor Detection and Classification
- Retinal Diseases and Treatments
- Music Technology and Sound Studies
- Sports Analytics and Performance
Chinese Academy of Sciences
2014-2024
Institute of Computing Technology
2009-2024
University of Chinese Academy of Sciences
2024
Institute of Geology and Geophysics
2021
The early detection of glaucoma is essential in preventing visual impairment. Artificial intelligence (AI) can be used to analyze color fundus photographs (CFPs) a cost-effective manner, making screening more accessible. While AI models for from CFPs have shown promising results laboratory settings, their performance decreases significantly real-world scenarios due the presence out-of-distribution and low-quality images. To address this issue, we propose Intelligence Robust Glaucoma...
The application of deep learning has allowed significant progress in medical imaging. However, few studies have focused on the diagnosis benign and malignant spinal tumors using imaging age information at patient level. This study proposes a multi-model weighted fusion framework (WFF) for based magnetic resonance (MRI) images information.The proposed WFF included tumor detection model, sequence classification statistic module sagittal MRI sequences obtained from 585 patients with (270...
Seismic data denoising has always been an indispensable step in the seismic exploration workflow. The quality of results directly affects subsequent inversion and migration imaging. In this article, we proposed a fast flexible convolutional neural network (FFCNN) based on DnCNN. contrast to existing DnCNN other artificial intelligence (AI)-based denoisers, FFCNN enjoys several desirable properties: 1) downsampling upscaling operations, which can sensibly reduce runtimes memory requirements...
In this article, a special decision surface for the weakly-supervised sound event detection (SED) and disentangled feature (DF) multi-label problem in polyphonic SED are proposed. We approach as multiple instance learning (MIL) utilize neural network framework with pooling module to solve it. General MIL approaches include two kinds: instance-level embedding-level approaches. present method of generating probabilities embedding level which tend perform better than terms bag-level...
Learning meaningful frame-wise features on a partially labeled dataset is crucial to semi-supervised sound event detection. Prior works either maintain consistency frame-level predictions or seek feature-level similarity among neighboring frames, which cannot exploit the potential of unlabeled data. In this work, we design Local and Global Consistency (LGC) regularization scheme enhance model both label- feature-level. The audio CutMix introduced change contextual information clips. Then,...
We propose a simple but efficient method termed Guided Learning for weakly-labeled semi-supervised sound event detection (SED). There are two sub-targets implied in SED: audio tagging and boundary detection. Instead of designing single model by considering trade-off between the sub-targets, we design teacher aiming at to guide student learn using unlabeled data. The guidance is guaranteed performance gap models. In meantime, liberated from able provide more excellent results. principle such...
Game tree search is a classical problem in the field of game theory and artificial intelligence. Fast algorithm critical for computer games asking real-time responses. In this paper, we focus on how to leverage massive parallelism capabilities GPU accelerate speed algorithms propose concise general parallel GPU. The performance model our presented analyzed theoretically. We implement two real called Connect6 Chess. also use these verify effectiveness efficiency algorithm. Experiments support...
Optical Braille Recognition methods usually use many designed steps, such as image deskewing, dots detection, cell grids construction and character recognition, which are less robust for complex scenes. This paper proposes an optimal semantic segmentation framework BraUNet to directly detect recognize characters in the whole original images. adds extra auxiliary learning strategy UNet network, uses long-range connections of feature maps between encoder decoder get more low-level features....
There are two sub-tasks implied in the weakly-supervised SED: audio tagging and event boundary detection. Current methods which combine multi-task learning with SED requires annotations both for these sub-tasks. Since there only available SED, we design multiple branches different purposes instead of pursuing tasks. Similar to tasks, can also prevent common feature share from overfitting any one purposes. We based on combinations MIL strategies pooling methods. Experiments DCASE 2018 Task 4...
SOAP protocol has emerged as the Web service communication standard. Because of relatively poor performance, many researchers focus on improving speed processing message. In this paper, we propose SPI, which introduces client usage pattern to low level process infrastructure, in order improve performance some kind services applications with specific patterns. The pack interface SPI is an approach reduce number messages latency side. This optimization technique packs concurrent requests into...
Computer-aided cancer survival risk prediction plays an important role in the timely treatment of patients. This is a challenging weakly supervised ordinal regression task associated with multiple clinical factors involved such as pathological images, genomic data and etc. In this paper, we propose new training method, multimodal object-level contrast learning, for prediction. First, construct learning pairs based on relationship among samples sample set. Then introduce method to train...
Contrastive Language-Audio Pretraining (CLAP) is pre-trained to associate audio features with human language, making it a natural zero-shot classifier recognize unseen sound categories. To adapt CLAP downstream tasks, prior works inevitably require labeled domain audios, which limits their scalability under data scarcity and deprives them of the capability detect novel classes as original CLAP. In this work, by leveraging modality alignment in CLAP, we propose an efficient audio-free prompt...
Large language models reveal deep comprehension and fluent generation in the field of multi-modality. Although significant advancements have been achieved audio multi-modality, existing methods are rarely leverage model for sound event detection (SED). In this work, we propose an end-to-end framework understanding features while simultaneously generating their temporal location. Specifically, employ pretrained acoustic to capture discriminative across different categories autoregressive text...
Recent advances have been witnessed in audio-language joint learning, such as CLAP, that shows much success multi-modal understanding tasks.These models usually aggregate uni-modal local representations, namely frame or word features, into global ones, on which the contrastive loss is employed to reach coarse-grained cross-modal alignment.However, frame-level correspondence with texts may be ignored, making it ill-posed explainability and fine-grained challenges also undermine performances...
To propose a deep learning-based classification framework, which can carry out patient-level benign and malignant tumors according to the patient's multi-plane images clinical information. A total of 430 cases spinal tumor, including axial sagittal plane by MRI, 297 for training (14072 images), 133 testing (6161 images) were included. Based on bipartite graph attention learning, this study proposed learning BgNet, tumor diagnosis. In structure, area in each is used as vertex graph, matching...
Aiming at language model (LM) adaptation for interactive speech transcription, this paper proposes a topic-based method using users' correction information. To infer the topic each utterance in continuous speech, uses information of history utterances adjacent to current one. Perplexity is calculated inference. Topic-related LMs are interpolated with background LM obtain adapted LMs. Each transcribed model. This supervised which believed outperform unsupervised approaches widely used...
This paper describes to our knowledge the first Chinese Braille speech synthesis system. The system consists of modules front-end processing, prosody prediction, and synthesis. processing includes conversion from common Pinyin, a high-precision character prediction model. To achieve high precision under limited corpus conditions, we propose model based on RoBERTa pre-trained model, which achieves an accuracy 94.42%. Finally, real-time TTS Tacotron2 LPCNet is proposed. We modify Tacotron2,...
In this paper, a special decision surface for the weakly-supervised sound event detection (SED) and disentangled feature (DF) multi-label problem in polyphonic SED are proposed. We approach as multiple instance learning (MIL) utilize neural network framework with pooling module to solve it. General MIL approaches include two kinds: instance-level embedding-level approaches. present method of generating probabilities embedding level which tend perform better than terms bag-level...
Equi-join is heavily used in MapReduce-based log processing. With the rapid growth of dataset sizes, join methods on MapReduce are extensively studied recently. We find that existing usually cannot get high query performance and affordable storage consumption at same time when faced with a huge amount data. They either only optimize one aspect but significantly sacrifice other or have limited applications. In this paper, after analyzing characteristics workloads underlying MapReduce, we...