Jie Lei

ORCID: 0000-0003-2292-2252
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Video Analysis and Summarization
  • Social Work Education and Practice
  • Human Pose and Action Recognition
  • Social Policy and Reform Studies
  • Advanced Image and Video Retrieval Techniques
  • Domain Adaptation and Few-Shot Learning
  • Network Security and Intrusion Detection
  • Homelessness and Social Issues
  • Music and Audio Processing
  • Topic Modeling
  • Employment and Welfare Studies
  • Work-Family Balance Challenges
  • China's Socioeconomic Reforms and Governance
  • Advanced Malware Detection Techniques
  • Child Abuse and Trauma
  • Banking stability, regulation, efficiency
  • Software System Performance and Reliability
  • Housing, Finance, and Neoliberalism
  • Healthcare innovation and challenges
  • Microfinance and Financial Inclusion
  • Network Packet Processing and Optimization
  • Advanced Research in Science and Engineering
  • Job Satisfaction and Organizational Behavior
  • Perfectionism, Procrastination, Anxiety Studies

Nanjing University
2025

China Agricultural University
2024

University of North Carolina at Chapel Hill
2020-2023

University of North Carolina Health Care
2020-2023

Zhongnan University of Economics and Law
2023

Sun Yat-sen University
2012-2022

Zhejiang University
2015-2018

Alibaba Group (China)
2018

Huazhong University of Science and Technology
2007-2016

Kunming University
2011-2012

Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements not only visual relevance but also discourse-based coherence across sentences in paragraph. Towards this goal, we propose a new approach called Memory-Augmented Recurrent Transformer (MART), which uses memory module augment transformer architecture. The generates highly summarized state from video segments and sentence history so as help better prediction next...

10.18653/v1/2020.acl-main.233 preprint EN cc-by 2020-01-01

The last several years have witnessed remarkable progress in video-and-language (VidL) understanding. However, most modern VidL approaches use complex and specialized model architectures sophisticated pretraining protocols, making the reproducibility, analysis comparisons of these frameworks difficult. Hence, instead proposing yet another new model, this paper conducts a thorough empirical study demystifying important factors design. Among that we investigate are (i) spatiotemporal...

10.1109/cvpr52729.2023.01034 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

How to manage, store, and index large numbers of videos is an urgent problem be solved. Although there are many video summarization models achieving good results, based on low-level features cannot summarize important semantic information analysis need related text descriptions that do not exist for most videos. As a consequence, the mining contained in itself more feasible way. In this paper, we propose action parsing-driven model reinforcement learning. The mainly divided into two parts,...

10.1109/tcsvt.2018.2860797 article EN IEEE Transactions on Circuits and Systems for Video Technology 2018-07-27

10.1016/j.irfa.2024.103259 article EN International Review of Financial Analysis 2024-03-28

Most existing video-and-language (VidL) research focuses on a single dataset, or multiple datasets of task. In reality, truly useful VidL system is expected to be easily generalizable diverse tasks, domains, and datasets. To facilitate the evaluation such systems, we introduce Video-And-Language Understanding Evaluation (VALUE) benchmark, an assemblage 11 over 3 popular tasks: (i) text-to-video retrieval; (ii) video question answering; (iii) captioning. VALUE benchmark aims cover broad range...

10.48550/arxiv.2106.04632 preprint EN cc-by-nc-sa arXiv (Cornell University) 2021-01-01

Video understanding relies on perceiving the global content and modeling its internal connections (e.g., causality, movement, spatio-temporal correspondence). To learn these interactions, we apply a mask-then-predict pre-training task discretized video tokens generated via VQ-VAE. Unlike language, where text are more independent, neighboring typically have strong correlations consecutive frames usually look very similar), hence uniformly masking individual will make too trivial to useful...

10.48550/arxiv.2106.11250 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Given a video with aligned dialogue, people can often infer what is more likely to happen next. Making such predictions requires not only deep understanding of the rich dynamics underlying and but also significant amount commonsense knowledge. In this work, we explore whether AI models are able learn make multimodal next-event predictions. To support research in direction, collect new dataset, named Video-and-Language Event Prediction (VLEP), 28,726 future event prediction examples (along...

10.18653/v1/2020.emnlp-main.706 article EN cc-by 2020-01-01

This study aims to identify whether the professional training of social workers has an effect on attitudinal antecedents turnover intention. investigated 395 trained and 353 non-trained from Integrated Family Service Centers in Guangzhou, China. It was found that education did not significantly alter In both groups, a higher feeling burnout or lower level organizational commitment produced intention turnover. Furthermore, significant influences job satisfaction with association environment...

10.1080/01488376.2018.1480569 article EN Journal of Social Service Research 2018-10-09

From 1988 the Chinese Government pursued a policy of ‘small government, big society’. The was determined at highest level and, after pilot study in Hainan Province, implemented vigorously series political reforms. It chief dimension economic restructuring which led from state ownership enterprises to so-called socialist market. Like its counterpart, it reflected China's adoption neo-liberal ideology. aims were encourage both civil society and private market provide social welfare thereby,...

10.1017/s147474641200036x article EN Social Policy and Society 2012-08-08

Recent years have witnessed an increasing interest in image-based question-answering (QA) tasks. However, due to data limitations, there has been much less work on video-based QA. In this paper, we present TVQA, a large-scale video QA dataset based 6 popular TV shows. TVQA consists of 152,545 pairs from 21,793 clips, spanning over 460 hours video. Questions are designed be compositional nature, requiring systems jointly localize relevant moments within clip, comprehend subtitle-based...

10.48550/arxiv.1809.01696 preprint EN other-oa arXiv (Cornell University) 2018-01-01

The aim of this study was to derive a competency framework from the perspective Chinese experts in social work, and reveal reasons for these particular choices. Working reference professional competencies that have been identified by USA, England Hong Kong, twenty academics, fourteen Directors/Deputy Directors fifteen senior practitioners individually chose those they perceived as important consensus reached accordance with fuzzy Delphi method. It discovered should be constituted twenty-four...

10.1093/bjsw/bcx035 article EN The British Journal of Social Work 2017-05-02

A large number of videos are generated and uploaded to video websites (like youku, youtube) every day play more important roles in human life. While bringing convenience, the big data raise difficulty summarization allow users browse a easily. However, although there many existing approaches, key frames selected fail integrate contexts qualities summarized results difficult evaluate because lack ground-truth. Inspired by previous methods that extract frames, we propose deep recurrent neural...

10.1109/icmew.2016.7574720 article EN 2016-07-01

Summary This article reports the results of an exploratory comparative study that investigated errors made by social work practitioners. Two groups workers, one in Italy and Mainland China, answered questions about causes effects mistakes, professional reactions to committed their colleagues, influence intuition on decision-making process generates mistakes judgement. Findings The most salient differences between Italian Chinese respondents related willingness talk confidence training...

10.1177/1468017320919879 article EN cc-by-nc Journal of Social Work 2020-04-19

This article puts the policy goal of Urban Minimum Living Standard Guarantee (UMLSG) in China into question and it aims to expose assertion ‘covering whoever is eligible’ as a myth. An exploratory study implementation UMLSG eligibility criteria was launched Guangzhou. Interviews were conducted with twenty-five poor people local bureaucrats. In addition people’s meeting income requirement, discovered that there three unwritten principles flexibly used by bureaucrats assessment: ability work,...

10.1177/0261018313514803 article EN Critical Social Policy 2014-01-16

This study investigates the key factors influencing turnover intentions of social workers, adopting a comparative approach within two patterns. Based on planned behaviors theory, personal attitudes, subjective norms, and perceived competences workers were measured to predict organizational occupational intention, with controlling variables being demographic factors, work-related professional perception. It was found that from Guangzhou (as “autonomous-embedded” patterns) expressed stronger...

10.1080/00377317.2019.1562144 article EN Smith College Studies in Social Work 2018-10-02

In this paper a novel approach to assessing the threat of network intrusions is proposed. Unlike present approaches which assess attack either from backward perspective (how probable security state can be reached) or attacks themselves much an would cause damage network), assesses forwarding it precursor future attacks). First, every type and some scenarios, their probabilities having following attacks(PFAs) are calculated by data mining algorithm. Then threats real time assessed these...

10.1109/nas.2007.15 article EN 2007-07-01

This study used the contracting projects of a district branch Women's Federation in Guangzhou as case examples to demonstrate both Chinese state's contractual controls over social work organisations (SWOs) and pragmatic response strategies SWOs professionals. Semi-structured interviews were conducted with seventeen participants, including local officials workers from contracted SWOs. It was found that ultimate goal consolidating legitimacy Communist Party China, Federation's dual role...

10.1177/02610183221089009 article EN cc-by Critical Social Policy 2022-04-25

Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements not only visual relevance but also discourse-based coherence across sentences in paragraph. Towards this goal, we propose a new approach called Memory-Augmented Recurrent Transformer (MART), which uses memory module augment transformer architecture. The generates highly summarized state from video segments and sentence history so as help better prediction next...

10.48550/arxiv.2005.05402 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Jie Leia, Wei Lub*, Staffan Höjerc, Agnieszka Repod, Zhenhao Sua, Mengyu Oua, Heng Yange, Ling Yuf & Boya Fengga School of Sociology and Anthropology, Sun Yat-sen University, Guangzhou, Chinab Department Social Work, Xiamen Xiamen, Chinac University Gothenburg, Swedend Science, Eastern Finland, Joensuu Kuopio, Finlande Humanities, Zhuhai City Polytechnic, Zhuhai, Chinaf Shenzhen Work College, Shenzhen, Chinag Guangdong Technology, China

10.1080/17525098.2021.1898080 article EN China Journal of Social Work 2021-03-26
Coming Soon ...