- Speech and Audio Processing
- Music and Audio Processing
- Hearing Loss and Rehabilitation
- Advanced Vision and Imaging
- Digital Media Forensic Detection
- Multilevel Inverters and Converters
- Advanced Algorithms and Applications
- Environmental remediation with nanomaterials
- Advanced Sensor and Control Systems
- Recycling and Waste Management Techniques
- Toxic Organic Pollutants Impact
- Robotics and Sensor-Based Localization
- Advanced Neural Network Applications
- Industrial Automation and Control Systems
- Domain Adaptation and Few-Shot Learning
- Advancements in Semiconductor Devices and Circuit Design
- Electrical Fault Detection and Protection
- Microbial bioremediation and biosurfactants
- Computer Graphics and Visualization Techniques
- Energy Load and Power Forecasting
- 3D Shape Modeling and Analysis
- Higher Education and Teaching Methods
- Occupational Health and Safety Research
- Synthesis and biological activity
- Infrared Target Detection Methodologies
Xi'an University of Technology
2004-2025
University of Oxford
2023-2024
Northeast Electric Power University
2012-2023
Oxford Research Group
2023
Australian National University
2023
Xiamen University
2020-2022
Chinese Research Academy of Environmental Sciences
2020-2022
Group Sense (China)
2022
Self-supervised audio-visual source localization aims to locate sound-source objects in video frames without extra annotations. Recent methods often approach this goal with the help of contrastive learning, which assumes only audio and visual contents from same are positive samples for each other. However, assumption would suffer false negative real-world training. For example, an sample, treating class as may mislead model therefore harm learned representations (e.g., a siren wailing...
Camera pose estimation is a long-standing computer vision problem that to date often relies on classical methods, such as handcrafted keypoint matching, RANSAC and bundle adjustment. In this paper, we propose formulate the Structure from Motion (SfM) inside probabilistic diffusion framework, modelling conditional distribution of camera poses given input images. This novel view an old has several advantages. (i) The nature framework mirrors iterative procedure (ii) formulation allows seamless...
Vision transformers have shown great success on numerous computer vision tasks. However, their central component, softmax attention, prohibits from scaling up to high-resolution images, due both the computational complexity and memory footprint being quadratic. Linear attention was introduced in natural language processing (NLP) which reorders self-attention mechanism mitigate a similar issue, but directly applying existing linear may not lead satisfactory results. We investigate this...
We propose a new problem called audio-visual segmentation (AVS), in which the goal is to output pixel-level map of object(s) that produce sound at time image frame. To facilitate this research, we construct first benchmark, i.e., AVSBench, providing pixel-wise annotations for sounding objects audible videos. It contains three subsets: AVSBench-object (Single-source subset, Multi-sources subset) and AVSBench-semantic (Semantic-labels subset). Accordingly, settings are studied: 1)...
Abstract The T‐type three‐level rectifier has garnered significant attention due to its ability enhance the voltage waveform quality in power systems and reduce electromagnetic interference with other equipment. To ensure high reliability high‐power wind photovoltaic generation systems, conducting fault diagnosis for rectifiers is crucial. This paper first analyzes input current characteristics of both in‐phase out‐of‐phase dual transistor open‐circuit faults. A current‐extended observer...
We present FLARE, a feed-forward model designed to infer high-quality camera poses and 3D geometry from uncalibrated sparse-view images (i.e., as few 2-8 inputs), which is challenging yet practical setting in real-world applications. Our solution features cascaded learning paradigm with pose serving the critical bridge, recognizing its essential role mapping structures onto 2D image planes. Concretely, FLARE starts estimation, whose results condition subsequent of geometric structure...
Generative models make huge progress to the photorealistic image synthesis in recent years. To enable humans steer generation process and customize output, many works explore interpretable dimensions of latent space GANs. Existing methods edit attributes output such as orientation or color scheme by varying code along certain directions. However, these usually require additional human annotations for each pretrained model, they mostly focus on editing global attributes. In this work, we...
Polybrominated dibenzo-p-dioxins and dibenzofurans (PBDD/Fs) are highly toxic persistent compounds that provoke a wave of publicity. Bromophenols important precursors for forming PBDD/Fs, their reaction path has always been research hotspot. In this study, the formation characteristic PBDD/Fs from 2,4,6-TBP were studied. The yields 2,3,7,8-substituted 2,4,6,8-TBDF different thermal products ranged 0.067 to 10.3 ng/g 0.207-9.68 ng/g, respectively. effects adding Cu, Fe, Sb2O3 investigated...
Given the problem that existing series arc fault identification methods use features such as time-frequency domain of current signal basis for identification, resulting in relatively limited detection solutions, and directly extracting using deep learning algorithms have insufficient feature extraction, a new method based on denoising autoencoder (DAE) residual network (ResNet) is proposed. First, large number training samples are obtained through sliding window data normalization methods,...
Application of an environmentally benign and non-toxic eutectic mixture DMU/LTA for the green synthesis ( E )-diethyl 2-arylvinylquinoline-3,4-dicarboxylates is described. A preliminary antitumor evaluation was then assayed.
Text- or image-to-3D generators and 3D scanners can now produce assets with high-quality shapes textures. These typically consist of a single, fused representation, like an implicit neural field, Gaussian mixture, mesh, without any useful structure. However, most applications creative workflows require to be made several meaningful parts that manipulated independently. To address this gap, we introduce PartGen, novel approach generates objects composed starting from text, image, unstructured...
Non-intrusive load identification can improve the interaction efficiency between power supply side and user of grid. Applying this technology alleviate problem energy shortage is a key technique for achieving efficient management on side. In response to cumbersome process manually selecting features low accuracy in traditional machine learning algorithms non-intrusive identification, paper proposes method that transforms one-dimensional reactive electric signal into two-dimensional image...
Feature selection can reduce the feature space dimension and improve recognition. In discriminatory fresh degree of tomatoes by electronic nose, it used three kinds extraction methods: sheath coefficient characteristics, similitude entropy characteristics energy methods were compared respectively. The results showed that has its advantages in nose detection.
Vision Transformers have achieved impressive performance in video classification, while suffering from the quadratic complexity caused by Softmax attention mechanism. Some studies alleviate computational costs reducing number of tokens calculation, but is still quadratic. Another promising way to replace with linear attention, which owns presents a clear drop. We find that such drop results lack concentration on critical features. Therefore, we propose feature fixation module reweight...
In this paper, a new optimal space-vector pulse- width modulation (SVPWM) technique is presented for three-phase voltage source inverters. 6 sectors are redivided into 12 ones based on SVPWM, and combining with local over- method, the discontinuous SVPWM strategies called as DSVPWMx including DSVPWMP, DSVPWMN, DSVPWMPN1 DSVPWMPN3 proposed. The principle of developed, essential relations among different discussed. simulation experimental results verify that right feasible.
Self-supervised audio-visual source localization aims to locate sound-source objects in video frames without extra annotations. Recent methods often approach this goal with the help of contrastive learning, which assumes only audio and visual contents from same are positive samples for each other. However, assumption would suffer false negative real-world training. For example, an sample, treating class as may mislead model therefore harm learned representations e.g., a siren wailing...
Abstract Field studies were conducted to study the emission and distribution characteristics of dioxins by elevating chloring concentration in feedstock a 600MW circular fluidized bed (CFB) boiler. The total equivalent quantity polychlorinated dibenzo–p–dioxins dibenzofurans (PCDD/Fs) all flue gas, electrostatic ash, cloth bag ash boiler samples under blank condition (i.e., was normal coal) chlorine labelling mixed with coal chlorine-containing agent) analyzed. Results illustrated that...