- Anomaly Detection Techniques and Applications
- Human Pose and Action Recognition
- Advanced Vision and Imaging
- Computer Graphics and Visualization Techniques
- Generative Adversarial Networks and Image Synthesis
- Video Surveillance and Tracking Methods
- Advanced Computational Techniques and Applications
- Speech and Audio Processing
- 3D Shape Modeling and Analysis
- Advanced Image and Video Retrieval Techniques
- Network Security and Intrusion Detection
- Seismology and Earthquake Studies
- Multimodal Machine Learning Applications
- Earthquake Detection and Analysis
- Power Systems and Technologies
- Retinal Imaging and Analysis
- Advanced Algorithms and Applications
- Artificial Immune Systems Applications
- Advanced Decision-Making Techniques
- earthquake and tectonic studies
- Neural Networks and Applications
- Image and Object Detection Techniques
- Educational Technology and Pedagogy
- Human Motion and Animation
- Remote Sensing and Land Use
University of Science and Technology of China
2021-2024
State Grid Corporation of China (China)
2023-2024
Chiba University
2013-2024
Lamar University
2018-2024
Tencent (China)
2023-2024
ShanghaiTech University
2017-2024
Wuhan Institute of Technology
2024
Liaoning University
2024
East China University of Science and Technology
2023
University of Science and Technology Beijing
2023
Anomaly detection in videos refers to the identification of events that do not conform expected behavior. However, almost all existing methods tackle problem by minimizing reconstruction errors training data, which cannot guarantee a larger error for an abnormal event. In this paper, we propose anomaly within video prediction framework. To best our knowledge, is first work leverages difference between predicted future frame and its ground truth detect predict with higher quality normal...
Motivated by the capability of sparse coding based anomaly detection, we propose a Temporally-coherent Sparse Coding (TSC) where enforce similar neighbouring frames be encoded with reconstruction coefficients. Then map TSC special type stacked Recurrent Neural Network (sRNN). By taking advantage sRNN in learning all parameters simultaneously, nontrivial hyper-parameter selection to can avoided, meanwhile shallow sRNN, coefficients inferred within forward pass, which reduces computational...
This paper tackles anomaly detection in videos, which is an extremely challenging task because unbounded. We approach this by leveraging a Convolutional Neural Network (CNN or ConvNet) for appearance encoding each frame, and Long Short Term Memory (ConvLSTM) memorizing all past frames corresponds to the motion information. Then we integrate ConvNet ConvLSTM with Auto-Encoder, referred as ConvLSTM-AE, learn regularity of ordinary moments. Compared 3D Auto-Encoder based detection, our main...
We tackle the human motion imitation, appearance transfer, and novel view synthesis within a unified framework, which means that model once being trained can be used to handle all these tasks. The existing task-specific methods mainly use 2D keypoints (pose) estimate body structure. However, they only expresses position information with no abilities characterize personalized shape of individual person limbs rotations. In this paper, we propose 3D mesh recovery module disentangle pose shape,...
This paper presents an anomaly detection method that is based on a sparse coding inspired Deep Neural Networks (DNN). Specifically, in light of the success detection, we propose Temporally-coherent Sparse Coding (TSC), where temporally-coherent term used to preserve similarity between two similar frames. The optimization coefficients TSC with Sequential Iterative Soft-Thresholding Algorithm (SIATA) equivalent special stacked Recurrent (sRNN) architecture. Further, reduce computational cost...
Abnormal event detection in the surveillance video is an essential but challenging task, and many methods have been proposed to deal with this problem. The previous either only consider appearance information or directly integrate results of motion without considering their endogenous consistency semantics explicitly. Inspired by rule humans identify abnormal frames from multi-modality signals, we propose Appearance-Motion Memory Consistency Network (AMMC-Net). Our method first makes full...
We study a challenging task, conditional human motion generation, which produces plausible sequences according to various inputs, such as action classes or textual descriptors. Since motions are highly diverse and have property of quite different distribution from modalities, descriptors in natural languages, it is hard learn probabilistic mapping the desired modality sequences. Besides, raw data capture system might be redundant contain noises; directly modeling joint over modalities would...
Classical semi-supervised video anomaly detection assumes that only normal data are available in the training set because of rare and unbounded nature anomalies. It is obviously, however, these infrequently observed abnormal events can actually help with identical or similar events, a line thinking motivates us to study open-set supervised few types many available. Under assumption be well predicted, we propose Margin Learning Embedded Prediction (MLEP) framework. There three features MLEP-...
Video Anomaly detection in videos refers to the identification of events that do not conform expected behavior. However, almost all existing methods cast this problem as minimization reconstruction errors training data including only normal events, which may lead self-reconstruction and cannot guarantee a larger error for an abnormal event. In paper, we propose formulate video anomaly within regime prediction. We advocate prediction networks are suitable detection. Then, introduce two...
Though the advancement of pre-trained large language models unfolds, exploration building a unified model for and other multi-modal data, such as motion, remains challenging untouched so far. Fortunately, human motion displays semantic coupling akin to language, often perceived form body language. By fusing data with large-scale models, motion-language pre-training that can enhance performance motion-related tasks becomes feasible. Driven by this insight, we propose MotionGPT, unified,...
The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over LLMs. We delve into study laws and present our distinctive findings that facilitate scale two commonly used configurations, 7B 67B. Guided by laws, we introduce DeepSeek LLM, project dedicated to advancing with long-term perspective. To support pre-training phase, have developed...
We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. Our approach is structured around three key dimensions: strive to ensure our data diverse, scalable, extensively covers scenarios including web screenshots, PDFs, OCR, charts, knowledge-based content, aiming a comprehensive representation of practical contexts. Further, we create use case taxonomy from real user construct instruction tuning dataset...
Diabetic Retinopathy (DR) is a non-negligible eye disease among patients with Diabetes Mellitus, and automatic retinal image analysis algorithm for the DR screening in high demand. Considering resolution of very high, where small pathological tissues can be detected only large local receptive field are required to identify those late stage disease, but directly training neural network deep architecture both time computational expensive difficult because gradient vanishing/exploding problem,...
Recently, deep learning has been used for hyperspectral image classification (HSIC) due to its powerful feature and ability. In this letter, a novel learning-based framework based on DeepLab is proposed HSIC. Inspired by the excellent performance of in semantic segmentation, applies excavate spatial features (HSI) pixel pixel. It breaks through limitation patch-wise most existing methods More importantly, it can extract at multiple scales effectively avoid reduction resolution. Furthermore,...
Phase carried by two orthogonal polarizations can be manipulated independently controlling both the geometric size and orientation of dielectric nanopost. With this characteristic, we demonstrate a novel multifunctional metasurface, which converts part incident linearly polarized light into its cross-polarization encodes phase independently. A beam splitter bifocal metalens were realized in single-layer metasurface approach. We fabricated demonstrated that focal spots separated transversely...
This work focuses on image anomaly detection by leveraging only normal images in the training phase. Most previous methods tackle reconstructing input with an autoencoder (AE)-based model, and underlying assumption is that reconstruction errors for are small, those abnormal large. However, these AE-based methods, sometimes, even reconstruct anomalies well; consequently, they less sensitive to anomalies. To conquer this issue, we propose structure-texture correspondence. Specifically, observe...
Co-speech gesture generation is to synthesize a sequence that not only looks real but also matches with the input speech audio. Our method generates movements of complete upper body, including arms, hands, and head. Although recent data-driven methods achieve great success, challenges still exist, such as limited variety, poor fidelity, lack objective metrics. Motivated by fact cannot fully determine gesture, we design learns set template vectors model latent conditions, which relieve...
The Tohoku earthquake of 11 March 2011 caused very large tsunamis and widespread devastation. Various high-resolution satellites captured details affected areas were utilized in emergency response. In this study, pre- post-event TerraSAR-X intensity images used to identify tsunami-flooded damaged buildings. Since water surface generally shows little backscatter, flooded could be extracted by the difference backscattering coefficients between images. Impacted buildings detected calculating...
We tackle human image synthesis, including motion imitation, appearance transfer, and novel view within a unified framework. It means that the model, once being trained, can be used to handle all these tasks. The existing task-specific methods mainly use 2D keypoints (pose) estimate body structure. However, they only express position information with no ability characterize personalized shape of person model limb rotations. In this paper, we propose 3D mesh recovery module disentangle pose...
We present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent objects. tackle the problem by leveraging 2D reference image to guide stages of geometry sculpting texture boosting. A central focus this work is address consistency issue existing works encounter. To sculpt geometries render coherently, we perform score distillation sampling via view-dependent diffusion model. This prior, alongside several training strategies, prioritizes but...