- Video Analysis and Summarization
- Advanced Image and Video Retrieval Techniques
- Human Pose and Action Recognition
- Anomaly Detection Techniques and Applications
- Video Surveillance and Tracking Methods
- Image Retrieval and Classification Techniques
- Data Management and Algorithms
- Multimodal Machine Learning Applications
- Music and Audio Processing
- Time Series Analysis and Forecasting
- Human Motion and Animation
- Diabetic Foot Ulcer Assessment and Management
- Fire Detection and Safety Systems
- Emotion and Mood Recognition
- Augmented Reality Applications
- Data Stream Mining Techniques
- Data Visualization and Analytics
- Gait Recognition and Analysis
- Artificial Intelligence in Games
- Data Mining Algorithms and Applications
Jilian Technology Group (China)
2019-2020
Fudan University
2014-2016
User-generated video collections are expanding rapidly in recent years, and systems for automatic analysis of these high demands. While extensive research efforts have been devoted to recognizing semantics like "birthday party" "skiing", little attempts made understand the emotions carried by videos, e.g., "joy" "sadness". In this paper, we propose a comprehensive computational framework predicting user-generated videos. We first introduce rigorously designed dataset collected from popular...
Emotion is a key element in user-generated video. However, it difficult to understand emotions conveyed such videos due the complex and unstructured nature of content sparsity video frames expressing emotion. In this paper, for first time, we propose technique transferring knowledge from heterogeneous external sources, including image textual data, facilitate three related tasks understanding emotion: emotion recognition, attribution emotion-oriented summarization. Specifically, our...
Despite growing research interest, emotion understanding for user-generated videos remains a challenging problem. Major obstacles include the diversity and complexity of video content, as well sparsity expressed emotions. For first time, we systematically study large-scale recognition by transferring deep feature encodings. In addition to traditional, supervised recognition, problem zero-shot where emotions in test set are unseen during training. To cope with this task, utilize knowledge...
The ability to recognize actions throughout a video is essential for surveillance, self-driving, and many other applications. Although researchers have investigated deep neural networks get better result in action recognition, these usually require large number of well-labeled data train. In this paper, we introduce dense dilated network collect information from snippet-level global-level. composed the blocks with densely connected convolutions layers. Our proposed framework capable fusing...
Animation has gained significant interest in the recent film and TV industry. Despite success of advanced video generation models like Sora, Kling, CogVideoX generating natural videos, they lack same effectiveness handling animation videos. Evaluating is also a great challenge due to its unique artist styles, violating laws physics exaggerated motions. In this paper, we present comprehensive system, AniSora, designed for generation, which includes data processing pipeline, controllable...
Recently, video action recognition has been widely studied. Training deep neural networks requires a large amount of well-labeled videos. On the other hand, videos in same class share high-level semantic similarity. In this paper, we introduce novel network architecture to simultaneously capture local and long-term spatial temporal information. The dilated dense is proposed with blocks being composed densely-connected convolutions layers. framework capable fusing each layer's outputs learn...
This article introduces a novel approach for fast summarization of user-generated videos (UGVs). Different from other types where the semantic content might vary greatly over time, most UGVs contain only single shot with relatively consistent high-level semantics and emotional content. Therefore, few representative segments, which can be selected based on segment-level recognition results, are generally sufficient summary. In addition, due to poor shooting quality many UGVs, factors such as...
Understanding video content is a challenging problem in many applications, especially for emotion analysis. Diverse and complicated contents are the major obstacles understanding. In this paper, we propose modality fusion framework to combine concept features from action, scene object models. We conduct selection investigate relations between high-level emotions. The discriminative concepts play important roles recognition. Fusing different of further improve performance. extensive...
Rapid development of mobile devices has led to explosive growth videos and online platforms, which creates great demand for advertising in videos. Existing methods often aim randomly select a time point as insertion position, means that the video content is likely not related ad content, resulting unsatisfactory user experience. While previous works have neglected understand rich semantics well multimodal information advertising, contrast works, we present an innovative method video-in-video...
Animation has gained significant interest in the recent film and TV industry. Despite success of advanced video generation models like Sora, Kling, CogVideoX generating natural videos, they lack same effectiveness handling animation videos. Evaluating is also a great challenge due to its unique artist styles, violating laws physics exaggerated motions. In this paper, we present comprehensive system, AniSora, designed for generation, which includes data processing pipeline, controllable...
As one of the most well-known artificial feature sampler, sliding window is widely used in scenarios where spatial and temporal information exists, such as cv, nlp, data stream, time series. Among which series common many like credit card payment, user behavior, sensors. General selection for features extracted by aggregate calls time-consuming iteration to generate features, then traditional methods are employed rank them. The decision period windows depends on domain knowledge trivial....
As one of the most well-known artificial feature sampler, sliding window is widely used in scenarios where spatial and temporal information exists, such as computer vision, natural language process, data stream, time series. Among which series common many like credit card payment, user behavior, sensors. General selection for features extracted by aggregate calls time-consuming iteration to generate features, then traditional methods are employed rank them. The decision key parameter, i.e....