- Human Pose and Action Recognition
- Anomaly Detection Techniques and Applications
- Multimodal Machine Learning Applications
- Video Surveillance and Tracking Methods
- Fuzzy and Soft Set Theory
- Advanced Algebra and Logic
- Domain Adaptation and Few-Shot Learning
- Rough Sets and Fuzzy Logic
- Gait Recognition and Analysis
- Bone Tissue Engineering Materials
- Hand Gesture Recognition Systems
- Automated Road and Building Extraction
- Diabetic Foot Ulcer Assessment and Management
- Bone Metabolism and Diseases
- Cancer-related molecular mechanisms research
- Financial Reporting and XBRL
- Bone and Dental Protein Studies
- Advanced Database Systems and Queries
- Healthcare Education and Workforce Issues
- Face recognition and analysis
- Gaze Tracking and Assistive Technology
- Mesenchymal stem cell research
- Polynomial and algebraic computation
- Periodontal Regeneration and Treatments
- Online and Blended Learning
University of Electronic Science and Technology of China
2020-2025
Chifeng University
2010-2023
Amazon (United States)
2020-2022
Peking University
2017-2020
Southern University of Science and Technology
2020
Shantou University
2020
Zhejiang University
2018
Beihang University
2015-2016
Nanjing University of Information Science and Technology
2013
Yangzhou University
2010
We introduce Video Transformer (VidTr) with separable-attention for video classification. Comparing commonly used 3D networks, VidTr is able to aggregate spatio-temporal information via stacked attentions and provide better performance higher efficiency. first the vanilla transformer show that module perform modeling from raw pixels, but heavy memory usage. then present which reduces cost by 3.3× while keeping same performance. To further optimize model, we propose standard deviation based...
Despite the fact that many 3D human activity benchmarks being proposed, most existing action datasets focus on recognition tasks for segmented videos. There is a lack of standard large-scale benchmarks, especially current popular data-hungry deep learning based methods. In this paper, we introduce new large scale benchmark (PKU-MMD) continuous multi-modality understanding and cover wide range complex activities with well annotated information. PKU-MMD contains 1076 long video sequences in 51...
Despite the fact that many 3D human activity benchmarks being proposed, most existing action datasets focus on recognition tasks for segmented videos. There is a lack of standard large-scale benchmarks, especially current popular data-hungry deep learning based methods. In this paper, we introduce new large scale benchmark (PKU-MMD) continuous skeleton-based understanding and cover wide range complex activities with well annotated information. PKU-MMD contains 1076 long video sequences in 51...
We propose TubeR: a simple solution for spatio-temporal video action detection. Different from existing methods that depend on either an offline actor detector or hand-designed actor-positional hypotheses like proposals anchors, we to directly detect tubelet in by simultaneously performing localization and recognition single representation. TubeR learns set of tubelet-queries utilizes tubelet-attention module model the dynamic nature clip, which effectively reinforces capacity compared using...
Large-scale benchmarks provide a solid foundation for the development of action analytics. Most previous activity focus on analyzing actions in RGB videos. There is lack large-scale and high-quality multi-modal In this article, we introduce PKU Multi-Modal Dataset (PKU-MMD), new benchmark human It consists about 28,000 instances 6.2 million frames total provides data sources, including RGB, depth, infrared radiation (IR), skeletons. To make PKU-MMD more practical, our dataset comprises two...
Following the publication of above paper, it was drawn to Editor's attention by a concerned reader that certain fluorescence microscopy images shown in Fig. 2C on p. 2805 were strikingly similar data had appeared previously other papers written different authors at research institutes. In view fact abovementioned already apparently been published prior its submission Molecular Medicine Reports, Editor has decided this paper should be retracted from Journal. The asked for an explanation...
Temporal action segmentation is a task to classify each frame in the video with an label. However, it quite expensive annotate every large corpus of videos construct comprehensive supervised training dataset. Thus this work we propose unsupervised method, namely SSCAP, that operates on unlabeled and predicts likely set temporal segments across videos. SSCAP leverages Self-Supervised learning extract distinguishable features then applies novel Co-occurrence Action Parsing algorithm not only...
In the world of action recognition research, one primary focus has been on how to construct and train networks model spatial-temporal volume an input video. These methods typically uniformly sample a segment clip (along temporal dimension). However, not all parts video are equally important determine in clip. this work, we instead learning where extract features, so as most informative We propose method called non-uniform aggregation (NUTA), which aggregates features only from segments. also...
Purpose This paper aims to explore factors influencing university students’ intent take formal lectures completely through e-learning with cloud meetings. Design/methodology/approach study has surveyed Chinese students who have experienced meetings as well traditional massive open online courses (MOOC) without live dialogues. The data are analysed based on structural equation modelling assess choose Findings findings show that per the technology acceptance model, find learning be easier than...
Multi-object tracking systems often consist of a combination detector, short term linker, re-identification feature extractor and solver that takes the output from these separate components makes final prediction. Differently, this work aims to unify all in single system. Towards this, we propose Siamese Track-RCNN, two stage detect-and-track framework which consists three functional branches: (1) detection branch localizes object instances; (2) Siamese-based track estimates motion (3)...
ABSTRACT An improved curve fitting for the resolution of overlapped peaks was proposed. The main work is to use continuous wavelet transform (CWT) sharpen and get reasonable initial estimates parameters each peak. As a result, fitted condition accurate results could be acquired. To verify suggested method, separation several kinds overlapping simulated by computer experimental voltammogram have been performed are discussed.
Online human action detection and forecast on untrimmed 3D skeleton sequences is a novel task based traditional recognition has not been fully studied. Its aim to localize recognize one in long sequence while doing forecasting at the same time. In this paper, we propose an online algorithm featuring Multi-Task Recurrent Neural Network solve problem. First, deep Long Short Term Memory (LSTM) network designed for feature extraction temporal dynamic modeling. Then utilize classification...
Video-text retrieval is a class of cross-modal representation learning problems, where the goal to select video which corresponds text query between given and pool candidate videos. The contrastive paradigm vision-language pretraining has shown promising success with large-scale datasets unified transformer architecture, demonstrated power joint latent space. Despite this, intrinsic divergence visual domain textual still far from being eliminated, projecting different modalities into space...
Hydroxyapatite scaffolds (HASs) are widely studied as suitable materials for bone replacement due to their chemical similarities organic materials. In our previous study, a novel HAS with 25‑30‑µm groove structure (HAS‑G) exhibited enhanced osteogenesis of mesenchymal stromal cells (BMSCs) compared HAS, potentially by modulating the macrophage‑induced immune microenvironment. However, exact effects different surface patterns on physiological processes attached is not known. The present study...
Most action recognition solutions rely on dense sampling to precisely cover the informative temporal clip. Extensively searching region is expensive for a real-world application. In this work, we focus improving inference efficiency of current backbones trimmed videos, and illustrate that an model can accurately classify with single pass over video unlike multi-clip common SOTA by learning drop non-informative features. We present Selective Feature Compression (SFC), strategy greatly...
Locating the center of pupils is most important foundation and core component gaze tracking. The accuracy tracking largely depends on quality images, but additional constraints large amount calculation make impractical high-resolution images. Although some eye-gaze trackers can get accurate result, improving pupil feature low-resolution images accurately recognizing closed eye are still common tasks in field estimation. Our aim to localization image. To this aim, we proposed a simple...
Previous preliminary studies have suggested that hydroxyapatite with a grooved structure (HAG) scaffold has good osteogenic potential. This type of may aid osteogenesis during the repair large maxillofacial bony defects. The ectopic effect and underlying mechanism were further studied using porous HAG scaffold‑based delivery human placenta‑derived mesenchymal stem cells (hPMSCs). A total 18 dogs randomly allocated into group hPMSC (HAG/hPMSC) group, three scaffolds implanted dorsal muscle...
Traffic jam has become a severe urban problem to most metropolises in the world. How understand and resolve these traffic problems global issue. In new era of big data, visualization analysis with traffic-related data are increasingly appreciated. This paper presents DiffusionInsighter, web-based visual system, that allows users explore flow diffusion patterns different spatial temporal granularity. The DiffusionInsighter first applies cleaning filtering component remove dirty remain...
We present compositional nearest neighbors (CompNN), a simple approach to visually interpreting distributed representations learned by convolutional neural network (CNN) for pixel-level tasks (e.g., image synthesis and segmentation). It does so reconstructing both CNN's input output copy-pasting corresponding patches from the training set with similar feature embeddings. To do efficiently, it makes of patch-match-based algorithm that exploits fact patch CNN pixel level vary smoothly....
Although many 3D head pose estimation methods based on monocular vision can achieve an accuracy of 5°, how to reduce the number required training samples and not use any hardware parameters as input features are still among biggest challenges in field estimation. To aim at these challenges, authors propose accurate method which act extension facial key point detection systems. The basic idea is normalised distance between points features, ℓ 1 ‐minimisation select a set sparse reflect mapping...
Temporal action segmentation is a task to classify each frame in the video with an label. However, it quite expensive annotate every large corpus of videos construct comprehensive supervised training dataset. Thus this work we propose unsupervised method, namely SSCAP, that operates on unlabeled and predicts likely set temporal segments across videos. SSCAP leverages Self-Supervised learning extract distinguishable features then applies novel Co-occurrence Action Parsing algorithm not only...
In the world of action recognition research, one primary focus has been on how to construct and train networks model spatial-temporal volume an input video. These methods typically uniformly sample a segment clip (along temporal dimension). However, not all parts video are equally important determine in clip. this work, we instead learning where extract features, so as most informative We propose method called non-uniform aggregation (NUTA), which aggregates features only from segments. also...
Mining the shared features of same identity in different scene, and unique are most significant challenges field person re-identification (ReID). Online Instance Matching (OIM) loss function Triplet main methods for ReID. Unfortunately, both them have drawbacks. OIM treats all samples equally puts no emphasis on hard samples. processes batch construction a complicated fussy way converges slowly. For these problems, we propose (TOIM) function, which lays improves accuracy ReID effectively. It...