- Multimodal Machine Learning Applications
- Advanced Image and Video Retrieval Techniques
- Advanced Vision and Imaging
- Domain Adaptation and Few-Shot Learning
- Robotics and Sensor-Based Localization
- Advanced Neural Network Applications
- Human Pose and Action Recognition
- Advanced Data Compression Techniques
- Video Coding and Compression Technologies
- Remote Sensing and LiDAR Applications
- Advanced Image Processing Techniques
- Image and Signal Denoising Methods
- Image and Object Detection Techniques
- Topic Modeling
- Manufacturing Process and Optimization
- Robot Manipulation and Learning
- Advancements in Battery Materials
- Extraction and Separation Processes
- Digital Filter Design and Implementation
- Industrial Vision Systems and Defect Detection
- Image and Video Quality Assessment
- 3D Shape Modeling and Analysis
- Explainable Artificial Intelligence (XAI)
- Robotic Path Planning Algorithms
- Advanced Optical Imaging Technologies
Tsinghua University
2002-2024
Fujian University of Technology
2024
Vicomtech
2019-2024
SRI International
2019-2024
Fujian Normal University
2022-2024
Sun Yat-sen University
2024
Southwest Jiaotong University
2023-2024
Haier Group (China)
2022-2023
Anhui University
2023
Hebei University of Engineering
2022-2023
In block motion estimation, a search pattern with different shape or size has very important impact on speed and distortion performance. A square-shaped is adopted in many popular fast algorithms. Recently, diamond-shaped was introduced estimation exhibited faster speed. Based an in-depth examination of the influence performance, we propose novel algorithm using hexagon-based to achieve further improvement. The investigated comparison diamond demonstrates significant speedup gain over...
Semantic reasoning and dynamic planning capabilities are crucial for an autonomous agent to perform complex navigation tasks in unknown environments. It requires a large amount of common-sense knowledge, that humans possess, succeed these tasks. We present SayNav, new approach leverages human knowledge from Large Language Models (LLMs) efficient generalization large-scale SayNav uses novel grounding mechanism, incrementally builds 3D scene graph the explored environment as inputs LLMs,...
Fast block motion estimation normally consists of low-resolution coarse search and the following fine-resolution inner search. Most algorithms developed attempt to speed up without considering accelerating focused On top hexagonal method recently developed, an enhanced algorithm is proposed further improve performance in terms reducing number points distortion, where a novel fast employed by exploiting distortion information evaluated points. Our experimental results substantially justify...
Common sense is essential for building intelligent machines. While some commonsense knowledge explicitly stated in human-generated text and can be learnt by mining the web, much of it unwritten. It often unnecessary even unnatural to write about facts. unwritten, this not unseen! The visual world around us full structure modeled knowledge. Can machines learn common simply observing our world? Unfortunately, requires automatic accurate detection objects, their attributes, poses, interactions...
Deep learning has had remarkable success in robotic perception, but its data-centric nature suffers when it comes to generalizing ever-changing environments. By contrast, physics-based optimization generalizes better, does not perform as well complicated tasks due the lack of high-level semantic information and reliance on manual parametric tuning. To take advantage these two complementary worlds, we present PyPose: a robotics-oriented, PyTorch-based library that combines deep perceptual...
Off-road navigation is essential for a wide range of applications in field robotics such as planetary exploration and disaster response. However, it remains an unresolved challenge due to the unstructured environments inherent complexity terrain-vehicle interactions. Traditional physics-based methods struggle accurately model nonlinear dynamics these interactions, while data-driven approaches often suffer from overfitting specific motion patterns, vehicle sizes, types, limiting their...
Recently, among various data hiding techniques, a new subset, lossless hiding, has drawn tremendous interest. Most existing algorithms are, however, fragile in the sense that they can be defeated when compression or other small alteration is applied to marked image. The method of C. De Vleeschouwer et al. (see IEEE Trans. Multimedia, vol.5, p.97-105, 2003) only semi-fragile technique (also referred as robust hiding), which against high quality JPEG compression. We first point out this fatal...
Artificial agents today can answer factual questions. But they fall short on questions that require common sense reasoning. Perhaps this is because most existing databases rely text to learn and represent knowledge. much of knowledge unwritten - partly it tends not be interesting enough talk about, some unnatural articulate in text. While unwritten, unseen. In paper we leverage semantic learned from images i.e. visual two textual tasks: fill-in-the-blank paraphrasing. We propose "imagine"...
Segmenting iris texture from an input image is important step for recognising pattern. It still a difficult task to localise available regions non-ideal images captured in non-cooperative situations such as lighting variations, on-the-move and off-angle view. To address this problem, study presents novel algorithm accurate fast segmentation. An adaptive mean shift procedure built find the rough position of centre. According localisation result, circle set initial contour. After combining...
Semantic segmentation and depth estimation are two important tasks in computer vision, many methods have been developed to tackle them. Commonly these addressed independently, but recently the idea of merging problems into a sole framework has studied under assumption that integrating highly correlated may benefit each other improve accuracy. In this paper, semantic jointly using single RGB input image unified convolutional neural network. We analyze different architectures evaluate which...
We present an unequal packet loss resilience scheme for robust transmission of video over the Internet. By jointly exploiting importance existing in different levels syntax hierarchy coding schemes, GOP-level and Resynchronization-packet-level Integrated Protection (GRIP) is designed joint protection (ULP) these two using forward error correction (FEC) across packets. Two algorithms are developed to achieve efficient FEC assignment proposed GRIP framework: a model-based algorithm heuristic...
The paper presents a fast block INTER mode decision algorithm to improve significantly the time efficiency of encoder in H.264. It makes use spatial homogeneity video object's textures and temporal stationarity characteristics inherent sequences. Specifically, is based on edge information, MB differencing used judge whether time-stationary. Based above analysis, only parts inter prediction modes are chosen for RDO (rate distortion optimization) calculation. Experimental results show that new...
This paper studies the optimal inspection of autonomous robots in a complex pipeline system. We solve 3-D region-guarding problem to suggest necessary spots. The proposed hierarchical integer linear programming optimization algorithm seeks fewest spots cover entire given region. Unlike most existing systems that focus on designing mobility and control explore robots, this focuses global planning thorough automatic environment. demonstrate efficacy computation framework using simulated...
This paper presents Advanced Audio Zip (AAZ), a fine grained scalable to lossless (SLS) audio coder that has recently been adopted as the reference model for MPEG-4 SLS work. AAZ integrates functionalities of high-compression perceptual coding, granular and coding in single framework, simultaneously provides backward compatibility Coding (AAC). bit-rate scalability from lossy such is achieved perceptually meaningful way, i.e., better quality at higher bit-rates. Despite its abundant...
Artificial agents today can answer factual questions. But they fall short on questions that require common sense reasoning. Perhaps this is because most existing databases rely text to learn and represent knowledge. much of knowledge unwritten - partly it tends not be interesting enough talk about, some unnatural articulate in text. While unwritten, unseen. In paper we leverage semantic learned from images i.e. visual two textual tasks: fill-in-the-blank paraphrasing. We propose "imagine"...
Abstract Cartilage tissues possess an extremely limited capacity for self-repair, and current clinical surgical approaches treating articular cartilage defects can only provide short-term relief. Despite significant advances in the field of tissue engineering, avoiding secondary damage caused by invasive procedures remains a challenge. In this study, injectable microtissues were developed through 3D culture rat bone marrow mesenchymal stem cells (BMSCs) within porous gelatin microcarriers...
As the feature size of semiconductor process is scaling down to 10nm and below, it possible assemble systems with high performance processors that can theoretically provide computational power up tens PLOPS. However, consumption these also rocketing millions watts, actual only around 60% theoretical performance. Today, efficiency sustained have become main foci processor designers. Traditional computing architecture such as superscalar GPGPU are proven be inefficient, there a big gap between...
Block transform coding is the most popular approach for image and video compression. The objective measurement of blocking artifacts plays an important role in design, optimization, assessment systems. This paper presents a new algorithm measuring images videos. It exhibits unique useful features: 1) it examines blocks individually so that can measure severity locally; 2) one-pass sense needs to be accessed only once; 3) takes into account high bit rate flatness very low images; 4)...
A novel method for outdoor natural color image segmentation road following is presented. From the results of experiments it can be seen that hue, saturation, and intensity (HSI) system should used rather than red, green, blue (RGB) system, done in S-I space with good performance. An automatic adaptive threshold selection developed. The preliminary indicate effectiveness efficiency this method.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
We present an empirical study of active learning for Visual Question Answering, where a deep VQA model selects informative question-image pairs from pool and queries oracle answers to maximally improve its performance under limited query budget. Drawing analogies human learning, we explore cramming (entropy), curiosity-driven (expected change), goal-driven error reduction) approaches, propose fast effective scoring function pick models the Bayesian Neural Network framework. find that need...