- Multimodal Machine Learning Applications
- Video Analysis and Summarization
- Social Work Education and Practice
- Human Pose and Action Recognition
- Social Policy and Reform Studies
- Advanced Image and Video Retrieval Techniques
- Domain Adaptation and Few-Shot Learning
- Network Security and Intrusion Detection
- Homelessness and Social Issues
- Music and Audio Processing
- Topic Modeling
- Employment and Welfare Studies
- Work-Family Balance Challenges
- China's Socioeconomic Reforms and Governance
- Advanced Malware Detection Techniques
- Child Abuse and Trauma
- Banking stability, regulation, efficiency
- Software System Performance and Reliability
- Housing, Finance, and Neoliberalism
- Healthcare innovation and challenges
- Microfinance and Financial Inclusion
- Network Packet Processing and Optimization
- Advanced Research in Science and Engineering
- Job Satisfaction and Organizational Behavior
- Perfectionism, Procrastination, Anxiety Studies
Nanjing University
2025
China Agricultural University
2024
University of North Carolina at Chapel Hill
2020-2023
University of North Carolina Health Care
2020-2023
Zhongnan University of Economics and Law
2023
Sun Yat-sen University
2012-2022
Zhejiang University
2015-2018
Alibaba Group (China)
2018
Huazhong University of Science and Technology
2007-2016
Kunming University
2011-2012
Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements not only visual relevance but also discourse-based coherence across sentences in paragraph. Towards this goal, we propose a new approach called Memory-Augmented Recurrent Transformer (MART), which uses memory module augment transformer architecture. The generates highly summarized state from video segments and sentence history so as help better prediction next...
The last several years have witnessed remarkable progress in video-and-language (VidL) understanding. However, most modern VidL approaches use complex and specialized model architectures sophisticated pretraining protocols, making the reproducibility, analysis comparisons of these frameworks difficult. Hence, instead proposing yet another new model, this paper conducts a thorough empirical study demystifying important factors design. Among that we investigate are (i) spatiotemporal...
How to manage, store, and index large numbers of videos is an urgent problem be solved. Although there are many video summarization models achieving good results, based on low-level features cannot summarize important semantic information analysis need related text descriptions that do not exist for most videos. As a consequence, the mining contained in itself more feasible way. In this paper, we propose action parsing-driven model reinforcement learning. The mainly divided into two parts,...
Most existing video-and-language (VidL) research focuses on a single dataset, or multiple datasets of task. In reality, truly useful VidL system is expected to be easily generalizable diverse tasks, domains, and datasets. To facilitate the evaluation such systems, we introduce Video-And-Language Understanding Evaluation (VALUE) benchmark, an assemblage 11 over 3 popular tasks: (i) text-to-video retrieval; (ii) video question answering; (iii) captioning. VALUE benchmark aims cover broad range...
Video understanding relies on perceiving the global content and modeling its internal connections (e.g., causality, movement, spatio-temporal correspondence). To learn these interactions, we apply a mask-then-predict pre-training task discretized video tokens generated via VQ-VAE. Unlike language, where text are more independent, neighboring typically have strong correlations consecutive frames usually look very similar), hence uniformly masking individual will make too trivial to useful...
Given a video with aligned dialogue, people can often infer what is more likely to happen next. Making such predictions requires not only deep understanding of the rich dynamics underlying and but also significant amount commonsense knowledge. In this work, we explore whether AI models are able learn make multimodal next-event predictions. To support research in direction, collect new dataset, named Video-and-Language Event Prediction (VLEP), 28,726 future event prediction examples (along...
This study aims to identify whether the professional training of social workers has an effect on attitudinal antecedents turnover intention. investigated 395 trained and 353 non-trained from Integrated Family Service Centers in Guangzhou, China. It was found that education did not significantly alter In both groups, a higher feeling burnout or lower level organizational commitment produced intention turnover. Furthermore, significant influences job satisfaction with association environment...
From 1988 the Chinese Government pursued a policy of ‘small government, big society’. The was determined at highest level and, after pilot study in Hainan Province, implemented vigorously series political reforms. It chief dimension economic restructuring which led from state ownership enterprises to so-called socialist market. Like its counterpart, it reflected China's adoption neo-liberal ideology. aims were encourage both civil society and private market provide social welfare thereby,...
Recent years have witnessed an increasing interest in image-based question-answering (QA) tasks. However, due to data limitations, there has been much less work on video-based QA. In this paper, we present TVQA, a large-scale video QA dataset based 6 popular TV shows. TVQA consists of 152,545 pairs from 21,793 clips, spanning over 460 hours video. Questions are designed be compositional nature, requiring systems jointly localize relevant moments within clip, comprehend subtitle-based...
The aim of this study was to derive a competency framework from the perspective Chinese experts in social work, and reveal reasons for these particular choices. Working reference professional competencies that have been identified by USA, England Hong Kong, twenty academics, fourteen Directors/Deputy Directors fifteen senior practitioners individually chose those they perceived as important consensus reached accordance with fuzzy Delphi method. It discovered should be constituted twenty-four...
A large number of videos are generated and uploaded to video websites (like youku, youtube) every day play more important roles in human life. While bringing convenience, the big data raise difficulty summarization allow users browse a easily. However, although there many existing approaches, key frames selected fail integrate contexts qualities summarized results difficult evaluate because lack ground-truth. Inspired by previous methods that extract frames, we propose deep recurrent neural...
Summary This article reports the results of an exploratory comparative study that investigated errors made by social work practitioners. Two groups workers, one in Italy and Mainland China, answered questions about causes effects mistakes, professional reactions to committed their colleagues, influence intuition on decision-making process generates mistakes judgement. Findings The most salient differences between Italian Chinese respondents related willingness talk confidence training...
This article puts the policy goal of Urban Minimum Living Standard Guarantee (UMLSG) in China into question and it aims to expose assertion ‘covering whoever is eligible’ as a myth. An exploratory study implementation UMLSG eligibility criteria was launched Guangzhou. Interviews were conducted with twenty-five poor people local bureaucrats. In addition people’s meeting income requirement, discovered that there three unwritten principles flexibly used by bureaucrats assessment: ability work,...
This study investigates the key factors influencing turnover intentions of social workers, adopting a comparative approach within two patterns. Based on planned behaviors theory, personal attitudes, subjective norms, and perceived competences workers were measured to predict organizational occupational intention, with controlling variables being demographic factors, work-related professional perception. It was found that from Guangzhou (as “autonomous-embedded” patterns) expressed stronger...
In this paper a novel approach to assessing the threat of network intrusions is proposed. Unlike present approaches which assess attack either from backward perspective (how probable security state can be reached) or attacks themselves much an would cause damage network), assesses forwarding it precursor future attacks). First, every type and some scenarios, their probabilities having following attacks(PFAs) are calculated by data mining algorithm. Then threats real time assessed these...
This study used the contracting projects of a district branch Women's Federation in Guangzhou as case examples to demonstrate both Chinese state's contractual controls over social work organisations (SWOs) and pragmatic response strategies SWOs professionals. Semi-structured interviews were conducted with seventeen participants, including local officials workers from contracted SWOs. It was found that ultimate goal consolidating legitimacy Communist Party China, Federation's dual role...
Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements not only visual relevance but also discourse-based coherence across sentences in paragraph. Towards this goal, we propose a new approach called Memory-Augmented Recurrent Transformer (MART), which uses memory module augment transformer architecture. The generates highly summarized state from video segments and sentence history so as help better prediction next...
Jie Leia, Wei Lub*, Staffan Höjerc, Agnieszka Repod, Zhenhao Sua, Mengyu Oua, Heng Yange, Ling Yuf & Boya Fengga School of Sociology and Anthropology, Sun Yat-sen University, Guangzhou, Chinab Department Social Work, Xiamen Xiamen, Chinac University Gothenburg, Swedend Science, Eastern Finland, Joensuu Kuopio, Finlande Humanities, Zhuhai City Polytechnic, Zhuhai, Chinaf Shenzhen Work College, Shenzhen, Chinag Guangdong Technology, China