- Human Pose and Action Recognition
- Multimodal Machine Learning Applications
- Topic Modeling
- Advanced Image and Video Retrieval Techniques
- Robotic Path Planning Algorithms
- Domain Adaptation and Few-Shot Learning
- Natural Language Processing Techniques
- Video Surveillance and Tracking Methods
- COVID-19 diagnosis using AI
- Music and Audio Processing
- Advanced Vision and Imaging
- Video Analysis and Summarization
- Sentiment Analysis and Opinion Mining
- Mobile Health and mHealth Applications
- Advanced Image Processing Techniques
- Image Retrieval and Classification Techniques
- Autonomous Vehicle Technology and Safety
- Soft Robotics and Applications
- Currency Recognition and Detection
- Cognitive Computing and Networks
- Hand Gesture Recognition Systems
- Image and Video Stabilization
- Spam and Phishing Detection
- Interactive and Immersive Displays
- Data Stream Mining Techniques
Guangdong Pharmaceutical University
2023-2025
China Southern Power Grid (China)
2024
Jiangsu University of Science and Technology
2024
Huaqiao University
2023
National Yang Ming Chiao Tung University
2022-2023
Jimei University
2023
Shaanxi History Museum
2022
Chongqing University of Posts and Telecommunications
2022
Communication University of China
2020
Convergence
2020
This paper introduces a classification algorithm called phHyperCuts. Like the previously best known algorithm, HiCuts, HyperCuts is based on decision tree structure. Unlike however, in which each node represents hyperplane, k--dimensional hypercube. Using this extra degree of freedom and new set heuristics to find optimal hypercubes for given amount storage, can provide an order magnitude improvement over existing algorithms. uses 2 10 times less memory than HiCuts optimized memory, while...
Machine-Learning tasks are becoming pervasive in a broad range of domains, and systems (from embedded to data centers). At the same time, small set machine-learning algorithms (especially Convolutional Deep Neural Networks, i.e., CNNs DNNs) proving be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed mix cores accelerators, accelerator can achieve rare combination efficiency (due number target algorithms) application scope. Until...
Abstract Zero-shot learning (ZSL) endeavors to extend knowledge novel classes by capitalizing on the semantic overlap across different categories. Existing methods often overlook integration of global and local representation, a synthesis that could significantly enhance zero-shot recognition accuracy. This paper introduces an innovative ZSL technique, termed based fusion representation (ZGLR). During training, both are optimized in tandem, while testing phase, outcomes from representational...
The objective of referring expression comprehension (REC) is to accurately identify the object in an image described by a given expression. Existing REC methods, including transformer-based and graph-based approaches among others, have shown robust performance tasks. In this study, we present groundbreaking framework named DiffusionREC for task. This reimagines task as text guided bounding box denoising diffusion process, through which noisy boxes are refined distilled pinpoint target box....
Referring expression comprehension aims to localize a specific object in an image according given language description. It is still challenging comprehend and mitigate the gap between various types of information visual textual domains. Generally, it needs extract salient features from match image. One challenge referring number region proposals generated by detection methods far more than entities corresponding Remarkably, candidate regions without described will bring severe impact on...
This study used both survey and interview questionnaires. It was designed to assess the feasibility, usability, utility of two point-of-care tools especially prepared with information relevant for dementia care by staff nurses in a small, medium-sized, large nursing home Florida. Twenty-five LPN or RN were recruited their use one tools—either set pocket cards (used control group) PC experimental group). The findings our indicate that personal digital assistants (PDAs) can potentially improve...
Visual grounding is an essential task in understanding the semantic relationship between given text description and target object image. Due to innate complexity of language rich context image, it still a challenging problem infer underlying perform reasoning objects image expression. Although existing visual methods have achieved promising progress, cross-modal mapping across different domains for not well handled, especially when expressions are complex long. To address issue, we propose...
Referring expression comprehension (REC) is a cross-modal matching task that aims to localize the target object in an image specified by text description. Most existing approaches for this focus on identifying only objects whose categories are covered training data. This restricts their generalization unseen and practical usage. To address issue, we propose domain adaptive network called CLIPREC zero-shot REC, which integrates Contrastive Language-Image Pretraining (CLIP) model graph-based...
Social Media Popularity Prediction (SMPP) is a crucial task that involves automatically predicting future popularity values of online posts, leveraging vast amounts multimodal data available on social media platforms. Studying and investigating becomes central to various applications requires novel methods comprehensive analysis, comprehension, accurate prediction.
Along with the development of human-computer interaction, action recognition has become an aspect computer vision. In recent years, skeleton-based a research hotspot in field The human skeleton can be obtained through Kinect sensor, but single sensor is often affected by self-occlusion so that it impossible to accurately obtain information on all joints human. this paper, data fusion method proposed. two sensors are placed fixed space, and they orthogonal each other. They extract from...
We present a cross platform prototype system targeted at collaborative search-and-retrieve navigation tasks. The is designed as timed-task completion game, and shares data about virtual world between two different computer platforms. One player has "God view" using tablet-based interface to assist in directing the other player. This first-person "Hero hover board-style interface. allow nonverbal communication waypoint beacons.
This paper systematically summarizes and discusses recent research on image-based human-object interaction (HOI) detection, which aims to detect pairs recognize the interactive behaviors between humans objects in an image. It has plenty of applications can serve as basis assist higher-level tasks visual understanding. We introduce existing methods by categorizing them into two main groups based model structure: one-stage two-stage approaches. further divide point-based, region-based,...
Landing on unmanned surface vehicles (USV) autonomously is a critical task for aerial (UAV) due to complex environments. To solve this problem, an autonomous landing method proposed based multi-level marker and linear active disturbance rejection control (LADRC) in study. A specially designed board placed the USV, ArUco codes with different scales are employed. Then, captured processed by camera mounted below UAV body. Using efficient perspective-n-point method, position attitude of...
Social Media Popularity Prediction (SMPP) is a crucial task that involves automatically predicting future popularity values of online posts, leveraging vast amounts multimodal data available on social media platforms. Studying and investigating becomes central to various applications requires novel methods comprehensive analysis, comprehension, accurate prediction. SMP Challenge an annual research activity has spurred academic exploration in this area. This paper summarizes the challenging...
Most previous frameworks either cost too much time or adopt some fixed modules resulting in alignment error video super-resolution (VSR). In this paper, we propose a novel many-to-many VSR framework with Iterative Collaboration (ICNet), which employs the concurrent operation by iterative collaboration between and reconstruction proving to be more efficient effective than existing recurrent sliding-window frameworks. With proposed collaboration, can conducted on super-resolved features from...
Visual grounding aims to localize a target object in an image based on given text description. Due the innate complexity of language, it is still challenging problem perform reasoning complex expressions and infer underlying relationship between expression image. To address these issues, we propose residual graph attention network for visual grounding. The proposed approach first builds expression-guided relation then performs multi-step followed by matching object. It allows performing...
Font recognition is an important part in the field of painting and calligraphy style recognition. Traditional font classification methods are mainly based on texture feature extraction other methods, which need to be improved accuracy. The mainstream use convolutional neural networks, but such have poor interpretability may face problem that some detailed features cannot accurately extracted. Based network, gray-level images, Local Binary Pattern (LBP) Histogram Oriented Gradient (HOG)...
In order to effectively integrate multimodal information and multilayer constraints, we present a unified probabilistic framework for sports video analysis. Based the framework, three instances of statistical models are constructed compared. Experimental results indicate our method with fusion processes semantic events in more effectively. With based on statistics, we'll discuss further sport content analysis fusing multimode information. Semantic videos essence multimodal. television relay,...