Tao Li

ORCID: 0000-0001-8758-0471
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Human Pose and Action Recognition
  • Video Surveillance and Tracking Methods
  • Multimodal Machine Learning Applications
  • Anomaly Detection Techniques and Applications
  • Advanced Image and Video Retrieval Techniques
  • Video Analysis and Summarization
  • Image Retrieval and Classification Techniques
  • Recommender Systems and Techniques
  • Auction Theory and Applications
  • Blockchain Technology Applications and Security
  • Gait Recognition and Analysis
  • Fire Detection and Safety Systems
  • Image Enhancement Techniques
  • Digital Rights Management and Security
  • EEG and Brain-Computer Interfaces
  • Generative Adversarial Networks and Image Synthesis
  • Face and Expression Recognition
  • Diabetic Foot Ulcer Assessment and Management
  • Advanced Neural Network Applications
  • Neuroscience and Neural Engineering
  • Image Processing and 3D Reconstruction
  • Advanced Bandit Algorithms Research
  • Music and Audio Processing
  • Advanced Graph Neural Networks
  • Domain Adaptation and Few-Shot Learning

Tianjin haihe hospital
2024

The University of Tokyo
2019-2023

Wuhan University of Science and Technology
2021-2023

Xi'an University of Architecture and Technology
2023

Meizu (China)
2023

Tiangong University
2023

Qufu Normal University
2023

Huaiyin Institute of Technology
2022-2023

Southwest University
2022

Tokyo University of Information Sciences
2021-2022

In this paper, we propose a self-supervised contrastive learning method to learn video feature representations. traditional methods, constraints from anchor, positive, and negative data pairs are used train the model. such case, different samplings of same treated as positives, clips videos negatives. Because spatio-temporal information is important for representation, set temporal more strictly by introducing intra-negative samples. addition samples videos, extended breaking relations in...

10.1109/tcsvt.2022.3141051 article EN cc-by IEEE Transactions on Circuits and Systems for Video Technology 2022-01-07

Conventional video summarization approaches based on reinforcement learning have the problem that reward can only be received after whole summary is generated. Such kind of sparse and it makes hard to converge. Another labelling each shot tedious costly, which usually prohibits construction large-scale datasets. To solve these problems, we propose a weakly supervised hierarchical framework, decomposes task into several subtasks enhance quality. This framework consists manager network worker...

10.1145/3338533.3366583 preprint EN 2019-12-15

We propose a self-supervised method to learn feature representations from videos. A standard approach in traditional methods uses positive-negative data pairs train with contrastive learning strategy. In such case, different modalities of the same video are treated as positives and clips negatives. Because spatio-temporal information is important for representation, we extend negative samples by introducing intra-negative samples, which transformed anchor breaking temporal relations clips....

10.1145/3394171.3413694 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

10.1007/s12555-014-0119-z article EN International Journal of Control Automation and Systems 2015-05-23

10.1109/ijcnn60899.2024.10651355 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2024-06-30

Recently, 3D convolutional networks yield good performance in action recognition. However, optical flow stream is still needed to ensure better performance, the cost of which very high. In this paper, we propose a fast but effective way extract motion features from videos utilizing residual frames as input data ConvNets. By replacing traditional stacked RGB with ones, 20.5% and 12.5% points improvements over top-1 accuracy can be achieved on UCF101 HMDB51 datasets when trained scratch....

10.48550/arxiv.2001.05661 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Recently, 3D convolutional networks yield good performance in action recognition. However, an optical flow stream is still needed for motion representation to ensure better performance, whose cost very high. In this paper, we propose a cheap but effective way extract features from videos utilizing residual frames as the input data ConvNets. By replacing traditional stacked RGB with ones, 35.6% and 26.6% points improvements over top-1 accuracy can be achieved on UCF101 HMDB51 datasets when...

10.1109/tip.2021.3124156 article EN cc-by IEEE Transactions on Image Processing 2021-01-01

Existing works mainly focus on crowd and ignore the confusion regions which contain extremely similar appearance to in background, while counting needs face these two sides at same time. To address this issue, we propose a novel end-to-end trainable region discriminating erasing network called CDENet. Specifically, CDENet is composed of modules mining module (CRM) guided (GEM). CRM consists basic density estimation (BDE) network, aware bridge network. The BDE first generates primary map,...

10.1109/tnnls.2023.3311020 article EN IEEE Transactions on Neural Networks and Learning Systems 2023-09-15

Recommendation system for tourist spots has very high potential value including social and economic benefits. The traditional clustering algorithms were usually used to build a recommendation system. However, have the risk on falling into local minimums, which may decrease final performance heavily. Few works focused their research few systems consider population attributes information fitting user implicit preference. To address problem, we our work designing novel spots. First new dataset...

10.1155/2019/2072375 article EN cc-by Mathematical Problems in Engineering 2019-01-01

Extracting effective deep features to represent content and style information is the key universal transfer. Most existing algorithms use VGG19 as feature extractor, which incurs a high computational cost impedes real-time transfer on high-resolution images. In this work, we propose lightweight alternative architecture - ArtNet, based GoogLeNet, later pruned by novel channel pruning method named Zero-channel Pruning specially designed for approaches. Besides, theoretically sound sandwich...

10.48550/arxiv.2006.09029 preprint EN other-oa arXiv (Cornell University) 2020-01-01

10.1007/s00371-013-0780-x article EN The Visual Computer 2013-02-05

Abstract Human rely profoundly on tactile feedback from fingertips to interact with the environment, whereas most hand prostheses used in clinics provide no feedback. In this study we demonstrate feasibility use a display glove that can be worn by unilateral amputee remaining healthy prosthesis. The main benefit is users could easily distinguish for each finger, even without training. claimed advantage supported preliminary tests subjects. This approach may lead development of effective and...

10.1515/cdbme-2016-0089 article EN cc-by-nc-nd Current Directions in Biomedical Engineering 2016-09-01

Recently, pretext-task based methods are proposed one after another in self-supervised video feature learning. Meanwhile, contrastive learning also yield good performance. Usually, new can beat previous ones as claimed that they could capture "better" temporal information. However, there exist setting differences among them and it is hard to conclude which better. It would be much more convincing comparison if these have reached closer their performance limits possible. In this paper, we...

10.48550/arxiv.2010.15464 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Malware is becoming a worldwide epidemic. Artificial Immune System self-adaptive method for malware detection. However the scalability and coverage problems reduced detection efficiency of an System. In order to solve these problems, this paper proposed model called Collaborative model, independent immune bodies in different computers were organized by virtual structure Body. could share detectors with each other, improve efficiency. A collaborative module was added every body communication...

10.1109/car.2010.5456638 article EN 2010-03-01

This Work has been Retracted by ACM because one or more of the authors this were proven to have known believed that contained incorrect and/or falsified results prior publication and violated anonymity independence review process for their paper "3D-based video recognition acceleration leveraging temporal locality" Proceedings 46th International Symposium on Computer Architecture (ISCA '19). Association Computing Machinery, New York, NY, USA, 79-90.

10.1145/3307650.3322260 article EN 2019-06-14

Abstract When the traditional collaborative filtering algori- thm is applied to drug recommendation, recommendation effect not good due sparsity of data. In view above problems, this paper proposes a algorithm based on user behavior and semantics (UBDS-CF). Firstly, we construct purchasing matrix users drugs, use weighted cosine similarity calculate basic between drugs; then category label similarity, extract feature vector function text using word model; main together constitute semantic...

10.1088/1742-6596/1802/3/032005 article EN Journal of Physics Conference Series 2021-03-01

Abstract Creating impressive video content such as movies and advertisements is a very important yet challenging task in business that requires both sense of creativity lot experience. Even professionals cannot necessarily invoke the impressions emotions they have aimed at. Many are created then disappear without giving large impact on viewers. This paper presents large-scale dataset television (TV) consists 14,490 videos. The each recognition rate interestingness from results questionnaires...

10.1007/s11042-023-14704-7 article EN cc-by Multimedia Tools and Applications 2023-08-01

Recently, 3D convolutional networks (3D ConvNets) yield good performance in action recognition. However, optical flow stream is still needed to ensure better performance, the cost of which very high. In this paper, we propose a fast but effective way extract motion features from videos utilizing residual frames as input data ConvNets. By replacing traditional stacked RGB with ones, 35.6% and 26.6% points improvements over top-l accuracy can be obtained on UCF101 HMDB51 datasets when...

10.1109/icip40778.2020.9191133 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2020-09-30
Coming Soon ...