- Human Pose and Action Recognition
- Music and Audio Processing
- Anomaly Detection Techniques and Applications
- Multimodal Machine Learning Applications
- Advanced Vision and Imaging
- Orbital Angular Momentum in Optics
- Nonlinear Photonic Systems
- Advanced Fiber Laser Technologies
- Topic Modeling
- Traffic Prediction and Management Techniques
- Network Security and Intrusion Detection
- Text and Document Classification Technologies
- Generative Adversarial Networks and Image Synthesis
- Data Stream Mining Techniques
- Phonetics and Phonology Research
- Opinion Dynamics and Social Influence
- Evolutionary Game Theory and Cooperation
- Handwritten Text Recognition Techniques
- Speech and Audio Processing
- Advanced Image Processing Techniques
- Interpreting and Communication in Healthcare
- Multisensory perception and integration
- Advanced Decision-Making Techniques
- Color Science and Applications
- Advanced Neural Network Applications
Chongqing Vocational Institute of Engineering
2025
Tsinghua University
2024
Communication University of China
2023-2024
State Key Laboratory of Electrical Insulation and Power Equipment
2023
Hangzhou Dianzi University
2023
Xi'an Jiaotong University
2023
Inner Mongolia University
2023
Fudan University
2022
Nankai University
2022
China University of Petroleum, East China
2018-2021
Weakly-supervised audio-visual violence detection aims to distinguish snippets containing multimodal events with video-level labels. Many prior works perform integration and interaction in an early or intermediate manner, yet overlooking the modality heterogeneousness over weakly-supervised setting. In this paper, we analyze asynchrony undifferentiated instances phenomena of multiple instance learning (MIL) procedure, further investigate its negative impact on learning. To address these...
Directed against the shortcoming of vulnerability assessment based on complex network theory for urban road traffic systems, a state method, considering influence congestion, is constructed by using cloud model to describe randomness and uncertainty characteristics risks. First, theory, primary index system introduced. Second, congestion states roads, introduced characterize features sections. After that, identification method charts constructed. Finally, basis topological mapping in Nan'an...
To enhance enterprises' interactive exploration capabilities for unstructured chart data, this paper proposes a multimodal question-answering method. Facing the challenge of recognizing curved and irregular text in charts, we introduce Gaussian heatmap encoding technology to achieve character-level precise annotation. Additionally, combine key point detection algorithm extract numerical information from charts convert it into structured table data. Finally, by employing cross-fusion model,...
Multiagent formation tracking tasks in constrained space commonly require the to transform its pattern adaptively. Thus, multiagent systems can effectively avoid spatial constraints and safely pass through region. Aiming achieve such control objectives, a strategy based on leader-following framework is designed this article. An adaptive scaling mechanism called orientational first designed. A time-varying matrix introduces real-time information about shape obstacles into controller. With...
Most existing intelligent editing tools for music and video rely on the cross-modal matching technology of affective consistency or similarity feature representations. However, these methods are not fully applicable to complex audiovisual scenarios, resulting in low accuracy suboptimal audience perceptual effects due ambiguous rules associated factors. To address limitations, this paper focuses both integration distribution artistic works movie television music. Based rich emotional...
Learning from imbalanced data streams differs the traditional learning paradigm due to issues of classes. It has significant implications in a myriad real-world applications, ranging financial risk, network security, medical diagnosis. Moreover, outliers usually appear streams. The issue class imbalance or anomaly itself could negatively affect performance underlying algorithms, and their combination makes problem harder. In this work, we propose an detection aided budget online weighted...
As an important branch of facial recognition, expression recognition has always been a hot topic in interdisciplinary research and broad application prospects. Currently, there are numerous methods based on convolutional neural networks. However, these show different effects datasets, which makes it difficult to distinguish their advantages disadvantages. This article presents comparative study three classic emotion algorithms Furthermore, image enhancement algorithm utilizing super...
We investigate propagation dynamics of cosh- and cosine-Airy beams in Kerr nonlinear media. The cosh-Airy beam can be considered as a superposition two Airy with different decay factors trajectories, respectively. It is shown that the solitons shedding from their interaction both in-phase out-of-phase cases are strongly dependent on modulation parameter associated cosh function. between exhibit attraction or repulsion under proper interval initial angle condition cases.
We investigate numerical interactions between two Airy beams with different relative phases in photorefractive (PR) nonlinear media one transverse dimension. It is shown that solitons and soliton pairs are generated not accelerating, although the incident exhibit acceleration property. When phase very small, attraction strengthened their propagation behavior similar to in-phase case, where attract each other main lobes can fuse into single breathing soliton, secondary generate symmetric...
Fine-tuning and testing a multilingual large language model is expensive challenging for low-resource languages (LRLs). While previous studies have predicted the performance of natural processing (NLP) tasks using machine learning methods, they primarily focus on high-resource languages, overlooking LRLs shifts across domains. Focusing LRLs, we investigate three factors: size fine-tuning corpus, domain similarity between corpora, source target languages. We employ classical regression models...
This article investigates the impact of visual color perception on fine-grained emotion prediction in videos, analyzing contribution features prediction. A total 20 subjects were involved this experiment. First, 10 conducted a emotional subjective evaluation experiment 50 video clips. Then, another annotation for these On basis, correlation and mechanism between perceptual emotions analyzed. Finally, model was established based combination objective features. It also observed that, compared...
The outbreak of COVID-19 has changed the teaching model international education. In order to ensure that education can be carried out as scheduled, some transnational courses have taken online. This paper begins with a brief review related studies on remote interpreting and class interpreting. Based Effort Model proposed by Daniel Gile, this analyzes difficulties which divided into two parts: information comprehension language expression, including input caused long speaking turn speaker....
Visual-only self-supervised learning has achieved significant improvement in video representation learning. Existing related methods encourage models to learn representations by utilizing contrastive or designing specific pretext tasks. However, some are likely focus on the background, which is unimportant for representations. To alleviate this problem, we propose a new view called long-range residual frame obtain more motion-specific information. Based this, Motion-Contrastive Perception...
Ultrasound images are quite useful for doctors to diagnose the kidney diseases nowadays. Since U-Net network was proposed, it has been applied in field of ultrasonic image segmentation. We propose a method that combine with generic advantageous networks, creating more efficient approach segmentation renal ultrasound images. By introducing GAN's supervision mechanism, segmented better performance. constructed training set 200 and test composed 40 images, carried out experiments on them....
Acoustic-to-articulatory inversion (AAI) aims to estimate the parameters of articulators from speech audio. There are two common challenges in AAI, which limited data and unsatisfactory performance speaker independent scenario. Most current works focus on extracting features directly ignoring importance phoneme information may limit AAI. To this end, we propose a novel network called SPN that uses different streams carry out AAI task. Firstly, improve speaker-independent experiment, new...
Weakly-supervised audio-visual violence detection aims to distinguish snippets containing multimodal events with video-level labels. Many prior works perform integration and interaction in an early or intermediate manner, yet overlooking the modality heterogeneousness over weakly-supervised setting. In this paper, we analyze asynchrony undifferentiated instances phenomena of multiple instance learning (MIL) procedure, further investigate its negative impact on learning. To address these...
Visual-only self-supervised learning has achieved significant improvement in video representation learning. Existing related methods encourage models to learn representations by utilizing contrastive or designing specific pretext tasks. However, some are likely focus on the background, which is unimportant for representations. To alleviate this problem, we propose a new view called long-range residual frame obtain more motion-specific information. Based this, Motion-Contrastive Perception...