- Multimodal Machine Learning Applications
- Topic Modeling
- Human Pose and Action Recognition
- Domain Adaptation and Few-Shot Learning
- Natural Language Processing Techniques
- Organic Light-Emitting Diodes Research
- Organic Electronics and Photovoltaics
- Advanced Image and Video Retrieval Techniques
- Advanced Neural Network Applications
- Video Surveillance and Tracking Methods
- Metaheuristic Optimization Algorithms Research
- Conducting polymers and applications
- Image Processing Techniques and Applications
- Luminescence and Fluorescent Materials
- Surface Roughness and Optical Measurements
- Handwritten Text Recognition Techniques
- Optimization and Packing Problems
- Industrial Vision Systems and Defect Detection
- Optimization and Search Problems
- Text Readability and Simplification
- Optical measurement and interference techniques
- Evaluation Methods in Various Fields
- Advanced Text Analysis Techniques
- Allelopathy and phytotoxic interactions
- Advanced Multi-Objective Optimization Algorithms
Group Sense (China)
2023-2024
NARI Group (China)
2023
Chinese Academy of Sciences
2014-2022
University of Chinese Academy of Sciences
2014-2022
Institute of Computing Technology
2019-2022
Xi'an Polytechnic University
2020-2021
Hebei GEO University
2007-2020
State Key Laboratory of Polymer Physics and Chemistry
2014-2016
Changchun Institute of Applied Chemistry
2014-2016
Hong Kong Baptist University
2016
Vehicle Re-Identification is to find images of the same vehicle from various views in cross-camera scenario. The main challenges this task are large intra-instance distance caused by different and subtle inter-instance discrepancy similar vehicles. In paper, we propose a parsing-based view-aware embedding network (PVEN) achieve feature alignment enhancement for ReID. First, introduce parsing parse into four then align features mask average pooling. Such provides fine-grained representation...
Weakly supervised referring expression grounding aims at localizing the referential object in an image according to linguistic query, where mapping between and query is unknown training stage. To address this problem, we propose a novel end-to-end adaptive reconstruction network (ARN). It builds correspondence region proposal manner: collaborative reconstruction. Specifically, first extract subject, location context features represent proposals respectively. Then, design module compute...
A novel red heteroleptic iridium complex, Ir(DPA-Flpy-CF<sub>3</sub>)<sub>2</sub>acac, was synthesized and whose corresponding solution-processed PhOLED shows a record power efficiency of 44.5 lm W<sup>−1</sup> with CIE coordinates (0.64, 0.36).
Weakly supervised Referring Expression Grounding (REG) aims to ground a particular target in an image described by language expression while lacking the correspondence between and expression. Two main problems exist weakly REG. First, lack of region-level annotations introduces ambiguities proposals queries. Second, most previous REG methods ignore discriminative location context referent, causing difficulties distinguishing from other same-category objects. To address above challenges, we...
Weakly supervised referring expression grounding (REG) aims at localizing the referential entity in an image according to linguistic query, where mapping between region (proposal) and query is unknown training stage. In expressions, people usually describe a target terms of its relationship with other contextual entities as well visual attributes. However, previous weakly REG methods rarely pay attention entities. this paper, we propose knowledge-guided pairwise reconstruction network...
Visual grounding (VG) aims to locate a specific target in an image based on given language query. The discriminative information from context is important for distinguishing the other objects, particularly targets that have same category as others. However, most previous methods underestimate such information. Moreover, they are usually designed standard scene (without any novel object), which limits their generalization open-vocabulary scene. In this paper, we propose framework with...
We present the Qwen2-VL Series, an advanced upgrade of previous Qwen-VL models that redefines conventional predetermined-resolution approach in visual processing. introduces Naive Dynamic Resolution mechanism, which enables model to dynamically process images varying resolutions into different numbers tokens. This allows generate more efficient and accurate representations, closely aligning with human perceptual processes. The also integrates Multimodal Rotary Position Embedding (M-RoPE),...
Organometal halide perovskites (OHPs) are becoming a hot topic in the field of display and lighting. Unlike strategy used for solar cells, that is, using several hundred nanometers thick OHP film fully absorbing light to convert electricity, thin-film OHPs (<50 nm) advantageous restrain its self-absorption drawback thus beneficial preparing efficient light-emitting diodes (LEDs). Here we manipulate excess molar ratio MABr/PbBr2 precursors post-annealing temperature obtain uniform suppress...
Recent advancements in multimodal foundation models (e.g., CLIP) have excelled zero-shot generalization. Prompt tuning involved the knowledge transfer from to downstream tasks has gained significant attention recently. Existing prompt-tuning methods cross-modal learning, however, either solely focus on language branch, or learn vision-language interaction a shallow mechanism. In this context, we propose Deeply coupled Cross-modal learning (DCP) method based CLIP. DCP flexibly accommodates...
The knitting needle cylinder is one of the core parts a hosiery machine. operation its needles can directly affect production quality and efficiency To reduce loss machine caused by faults, fault detection system for machines based on synergistic combination laser vision proposed in this paper. When was operating normally, photoelectric detector collected signal reflected monitored using ratio adjacent peak-to-peak distances signals. detected, stopped immediately, charge-coupled device...
Ant colony optimization (ACO) algorithm is a metaheuristic and stochastic search technology, which has been one of the effective tools for solving discrete problems. However, there are two bottlenecks large-scaled problems: ACO needs too much time to convergent solutions may not be really optimal. This paper proposes novel multidimensional knapsack problems (MKP), employs new pheromone diffusion model mutation scheme. First, in light preference better solutions, association distances among...
Problems with knitting needles are one of the main causes production loss fabric. In order to detect problems quickly and accurately, this paper proposes a hosiery needle detection system based on machine vision. Meanwhile, according working condition real needles, simulated cylinder rotary platform is built. The can problems, causing issue be identified as beginning fabric defect appears. losses caused by bending fracture reduced at source. image processing, vertical projection algorithm...
The trade-off between charge transport and energy transfer is realized by manipulating the dendrimer host H2 aggregation with binary solvent mixture, along 25% device efficiency enhancement for FIrpic based blue PhOLED.
In this study, we aim to reduce generation latency for Named Entity Recognition (NER) with Large Language Models (LLMs). The main cause of high in LLMs is the sequential decoding process, which autoregressively generates all labels and mentions NER, significantly increase sequence length. To end, introduce Parallel Decoding LLM NE} (PaDeLLM-NER), a approach that integrates seamlessly into existing generative model frameworks without necessitating additional modules or architectural...
Vehicle Re-Identification is to find the same vehicle from images captured in different views under cross-camera scenarios. Traditional methods focus on depicting holistic appearance of a vehicle, but they suffer hard samples with type and color. Recent works leverage discriminative visual cues solve this problem, where three challenges exist as follows. First, features are misaligned distorted because viewpoint variance. Second, usually subtle, which easy be diluted by large area...
Vehicle Re-Identification is to find images of the same vehicle from various views in cross-camera scenario. The main challenges this task are large intra-instance distance caused by different and subtle inter-instance discrepancy similar vehicles. In paper, we propose a parsing-based view-aware embedding network (PVEN) achieve feature alignment enhancement for ReID. First, introduce parsing parse into four views, then align features mask average pooling. Such provides fine-grained...
Referring Expression Grounding (REG) aims at localizing a particular object in an image according to language expression. Recent REG methods have achieved promising performance, but most of them are constrained limited categories due the scale current datasets. In this paper, we explore new scenario, where model can ground novel objects out training data. With motivation, propose Concept-Context Disentangled network (CCD) which transfers concepts from auxiliary classification data with...
Despite the notable advancements achieved by leveraging pre-trained vision-language (VL) models through few-shot tuning for downstream tasks, our detailed empirical study highlights a significant dependence of learning outcomes on careful selection training examples - facet that has been previously overlooked in research. In this study, we delve into devising more effective strategies meticulous examples, as opposed to relying random sampling, enhance potential existing prompt methodologies....
This paper introduces SynthDoc, a novel synthetic document generation pipeline designed to enhance Visual Document Understanding (VDU) by generating high-quality, diverse datasets that include text, images, tables, and charts. Addressing the challenges of data acquisition limitations existing datasets, SynthDoc leverages publicly available corpora advanced rendering tools create comprehensive versatile dataset. Our experiments, conducted using Donut model, demonstrate models trained with...