- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Topic Modeling
- Multimodal Machine Learning Applications
- Human Pose and Action Recognition
- Natural Language Processing Techniques
- Data Quality and Management
- Speech Recognition and Synthesis
- Reinforcement Learning in Robotics
- Video Surveillance and Tracking Methods
- Digital Innovation in Industries
- Scientific Computing and Data Management
- Generative Adversarial Networks and Image Synthesis
- Face recognition and analysis
- Advanced Image and Video Retrieval Techniques
- Business Process Modeling and Analysis
- Text Readability and Simplification
- Linguistic research and analysis
- COVID-19 diagnosis using AI
- Robot Manipulation and Learning
- Speech and Audio Processing
- Music and Audio Processing
- Human Motion and Animation
- Educational Technology and Pedagogy
- 3D Shape Modeling and Analysis
First Affiliated Hospital of Jiangxi Medical College
2025
Nanchang University
2025
Sir Run Run Shaw Hospital
2024
Zhejiang University
2024
Microsoft (United States)
2024
University of Oxford
2024
Tencent (China)
2023-2024
Liaoning Technical University
2024
Guilin University of Technology
2023
Guilin University
2023
Current end-to-end machine reading and question answering (Q\&A) models are primarily based on recurrent neural networks (RNNs) with attention. Despite their success, these often slow for both training inference due to the sequential nature of RNNs. We propose a new Q\&A architecture called QANet, which does not require networks: Its encoder consists exclusively convolution self-attention, where local interactions self-attention global interactions. On SQuAD dataset, our model is 3x 13x...
Introduction Climate change isone of the major challenges facing world today, causing frequent extreme weather events that significantly impact human production, life, and ecological environment. Traditional climate prediction models largely rely on simulation physical processes. While they have achieved some success, these still face issues such as complexity, high computational cost, insufficient handling multivariable nonlinear relationships. Methods In light this, this paper proposes a...
Human-centric perceptions include a variety of vision tasks, which have widespread industrial applications, including surveillance, autonomous driving, and the metaverse. It is desirable to general pretrain model for versatile human-centric downstream tasks. This paper forges ahead along this path from aspects both benchmark pretraining methods. Specifically, we propose HumanBench based on existing datasets comprehensively evaluate common ground generalization abilities different methods 19...
Information asymmetries create extractive, often harmful relationships between platform workers (e.g., Uber or Deliveroo drivers) and their algorithmic managers. Recent HCI studies have put forward more equitable designs but leave open questions about the social technical infrastructures required to support them without cooperation of platforms. We conducted a participatory design study in which deconstructed re-imagined Uber's schema for driver data. analyzed data structures institutions...
Few-shot object detection (FSOD) aims to expand an detector for novel categories given only a few instances training. The training samples restrict the performance of FSOD model. Recent text-to-image generation models have shown promising results in generating high-quality images. How applicable these synthetic images are tasks remains under-explored. This work extensively studies how generated from state-of-the-art generators benefit tasks. We focus on two perspectives: (1) use data FSOD?...
Few-shot object detection (FSOD) aims to expand an detector for novel categories given only a few instances training. However, detecting with samples usually leads the problem of misclassification. In FSOD, we notice false positive (FP) is prominent, in which base are often recognized as ones. To address this issue, data augmentation pipeline that Crops Novel and Pastes them on selected Base images, called CNPB, proposed. There two key questions be answered: (1) How select useful images? (2)...
As an important constitute of land consolidation, high-standard basic farmland construction is means to protect the quantity, quality and ecological environment cultivated land. Its target not only lies in increase but also improvement quality, agricultural production conditions ecosystem environments. In present study, evaluation method arrangement were explored facilitate process decision-making implementation for (HSBFC) with administrative village as unit. Taking comprehensive project...
Self-training has shown great potential in semi-supervised learning. Its core idea is to use the model learned on labeled data generate pseudo-labels for unlabeled samples, and turn teach itself. To obtain valid supervision, active attempts typically employ a momentum teacher pseudo-label prediction yet observe confirmation bias issue, where incorrect predictions may provide wrong supervision signals get accumulated training process. The primary cause of such drawback that prevailing...
Traditional automatic speech recognition (ASR) systems usually focus on individual utterances, without considering long-form with useful historical information, which is more practical in real scenarios. Simply attending longer transcription history for a vanilla neural transducer model shows no much gain our preliminary experiments, since the prediction network not pure language model. This motivates us to leverage factorized structure, containing model, vocabulary predictor. We propose...
Recent years have witnessed a rapid growth of deep generative models, with text-to-image models gaining significant attention from the public. However, existing often generate images that do not align well human preferences, such as awkward combinations limbs and facial expressions. To address this issue, we collect dataset choices on generated Stable Foundation Discord channel. Our experiments demonstrate current evaluation metrics for correlate choices. Thus, train preference classifier...
In Hindsight Experience Replay (HER), a reinforcement learning agent is trained by treating whatever it has achieved as virtual goals. However, in previous work, the experience was replayed at random, without considering which episode might be most valuable for learning. this paper, we develop an energy-based framework prioritizing hindsight robotic manipulation tasks. Our approach inspired work-energy principle physics. We define trajectory energy function sum of transition target object...
We present SocialGenPod, a decentralised and privacy-friendly way of deploying generative AI Web applications. Unlike centralised data architectures that keep user tied to application service providers, we show how one can use Solid - specification decouple from demonstrate SocialGenPod using prototype allows users converse with different Large Language Models, optionally leveraging Retrieval Augmented Generation generate answers grounded in private documents stored any Pod the is allowed...
Much of named entity recognition (NER) research focuses on developing dataset-specific models based data from the domain interest, and a limited set related types. This is frustrating as each new dataset requires model to be trained stored. In this work, we present ``versatile'' model---the Prompting-based Unified NER system (PUnifiedNER)---that works with different domains can recognise up 37 types simultaneously, theoretically it could many possible. By using prompt learning, PUnifiedNER...
We study the problem of training a Reinforcement Learning (RL) agent that is collaborative with humans without using human data. Although such agents can be obtained through self-play training, they suffer significantly from distributional shift when paired unencountered partners, as humans. In this paper, we propose Maximum Entropy Population-based (MEP) to mitigate shift. MEP, in population are trained our derived Population bonus promote pairwise diversity between and individual...
Recent popular Role-Playing Games (RPGs) saw the great success of character auto-creation systems. The bone-drivenface model controlled by continuous parameters (like position bones) and discrete hairstyles) makes it possible for users to personalize customize in-game characters. Previous systems are mostly image-driven, where facial optimized so that rendered looks similar reference face photo. This paper proposes a novel text-to-parameter translation method (T2P) achieve zero-shot...
Human-centric perception tasks, e.g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse sports analysis. There is a recent surge to develop human-centric foundation models that can benefit broad range of tasks. While many achieved success, they did not explore 3D vision-language tasks for required task-specific finetuning. These limitations restrict their application more downstream situations. To tackle these...
Previous studies have demonstrated that natural steroid compounds containing a peroxide bridge exhibited potential anti-hepatitis B virus activity. To continue our research, simple and regioselective methodology, using Eosin Y as clean photosensitized oxidation catalyst, was developed for the synthesis of in steroids. The method catalyst exposed to visible light furbished high yields, did not involve tedious work-up or purification, avoided environmentally hazardous solvents. It can be...
Current methods for prompt learning in zero-shot scenarios widely rely on a development set with sufficient human-annotated data to select the best-performing template posteriori. This is not ideal because real-world scenario of practical relevance, no labelled available. Thus, we propose simple yet effective method screening reasonable templates text classification: Perplexity Selection (Perplection). We hypothesize that language discrepancy can be used measure efficacy templates, and...
Large Language Models (LLMs) have demonstrated proficiency in addressing tasks that necessitate a combination of task planning and the usage external tools require blend utilization tools, such as APIs. However, real-world complex systems present three prevalent challenges concerning tool usage: (1) The real system usually has vast array APIs, so it is impossible to feed descriptions all APIs prompt LLMs token length limited; (2) designed for handling tasks, base can hardly plan correct...
In today's digital landscape, the Web has become increasingly centralized, raising concerns about user privacy violations.Decentralized architectures, such as Solid, offer a promising solution by empowering users with better control over their data in personal 'Pods'.However, significant challenge remains: must navigate numerous applications to decide which application can be trusted access Pods.This often involves reading lengthy and complex Terms of Use agreements, process that find...