- Topic Modeling
- Natural Language Processing Techniques
- Human Pose and Action Recognition
- Hand Gesture Recognition Systems
- Multimodal Machine Learning Applications
- Particle physics theoretical and experimental studies
- Hearing Impairment and Communication
- Computational Drug Discovery Methods
- High-Energy Particle Collisions Research
- Advanced Vision and Imaging
- Text Readability and Simplification
- Computational Physics and Python Applications
- Advanced Image Processing Techniques
- Speech and Audio Processing
- Speech and dialogue systems
- Machine Learning in Materials Science
- Sustainable Building Design and Assessment
- Particle Detector Development and Performance
- Environmental Impact and Sustainability
- Gait Recognition and Analysis
- Generative Adversarial Networks and Image Synthesis
- Genomics and Phylogenetic Studies
- Full-Duplex Wireless Communications
- Software-Defined Networks and 5G
- Human Motion and Animation
Tsinghua University
2023-2024
Soochow University
2024
First Affiliated Hospital of Soochow University
2024
Jiangsu Normal University
2024
Nanjing University of Information Science and Technology
2024
University of Science and Technology of China
2015-2024
Shanghai Jiao Tong University
2022-2024
Huaqiao University
2024
Beijing Information Science & Technology University
2024
Tencent (China)
2023
Despite the recent success of deep learning in continuous sign language recognition (CSLR), models typically focus on most discriminative features, ignoring other potentially non-trivial and informative contents. Such characteristic heavily constrains their capability to learn implicit visual grammars behind collaboration different cues (i,e., hand shape, facial expression body posture). By injecting multi-cue into neural network design, we propose a spatial-temporal (STMC) solve...
Despite existing pioneering works on sign language translation (SLT), there is a non-trivial obstacle, i.e., the limited quantity of parallel sign-text data. To tackle this data bottleneck, we propose back-translation (SignBT) approach, which incorporates massive spoken texts into SLT training. With text-to-gloss model, first back-translate monolingual text to its gloss sequence. Then, paired sequence generated by splicing pieces from an estimated gloss-to-sign bank at feature level....
The life-cycle assessment method, which originates from general products and services, has gradually come to be applied investigations of the carbon emissions (LCCE) buildings. A literature review was conducted clarify LCCE implications, calculations, reductions in context total 826 global building emission calculation cases were obtained 161 studies based on framework stage division stipulated by ISO 21930 basic principles factor (EF) approach. methods results are discussed herein, modules...
Great progress has been made in face sketch synthesis recent years. State-of-the-art methods commonly apply a Markov Random Fields (MRF) model to select local patches from set of training data. Such methods, however, have two major drawbacks. Firstly, the MRF used cannot synthesize new patches. Secondly, optimization problem solving is NP-hard. In this paper, we propose novel Weight (MWF) that capable synthesizing We formulate our into convex quadratic programming (QP) which optimal solution...
Despite the recent success of deep learning in video-related tasks, models typically focus on most discriminative features, ignoring other potentially non-trivial and informative contents. Such characteristic heavily constrains their capability to learn implicit visual grammars sign videos behind collaboration different cues (i.e., hand shape, facial expression body posture). To this end, we approach video-based language understanding with multi-cue propose a spatial-temporal (STMC) network...
Continuous sign language recognition is a weakly supervised problem to translate video sequence gloss sequence, where temporal boundary of each not annotated. The CNN-RNN-CTC framework shows effectiveness in this task by estimating pseudo label for clip and retraining the feature extractor alternately. quality labels greatly impacts final performance. In contrast existing methods which select maximum posterior probability, we propose dynamic decoding method find reasonable alignment path via...
This paper does not aim at introducing a novel model for document-level neural machine translation. Instead, we head back to the original Transformer and hope answer following question: Is capacity of current models strong enough translation? Interestingly, observe that with appropriate training techniques can achieve results document translation, even length 2000 words. We evaluate this several recent approaches on nine datasets two sentence-level across six languages. Experiments show...
This paper presents WordRecorder, an efficient and accurate handwriting recognition system that identifies words using acoustic signals generated by pens paper, thus enabling ubiquitous recognition. To achieve this, we carefully craft a new deep-learning based sensing framework with three major components, i.e., segmentation, classification, word suggestion. First, design dual-window approach to segment the raw signal into series of letters exploiting subtle features handwriting. Then...
Yu Bao, Hao Zhou, Shujian Huang, Dongqi Wang, Lihua Qian, Xinyu Dai, Jiajun Chen, Lei Li. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.
Currently, the route planning functions in 2D/3D campus navigation systems market are unable to process indoor and outdoor localization information simultaneously, UI experiences not optimal because they limited by service platforms. An ARCore-based augmented reality system is designed this paper order solve relevant problems. Firstly, proposed uses ARCore enhance presenting 3D real scenes. Secondly, a visual inertial ranging algorithm for real-time locating map generating mobile devices....
In this paper, we introduce FROSTER, an effective framework for open-vocabulary action recognition. The CLIP model has achieved remarkable success in a range of image-based tasks, benefiting from its strong generalization capability stemming pretaining on massive image-text pairs. However, applying directly to the recognition task is challenging due absence temporal information CLIP's pretraining. Further, fine-tuning datasets may lead overfitting and hinder generalizability, resulting...
In the field of skeleton-based action recognition, current top-performing graph convolutional networks (GCNs) exploit intra-sequence context to construct adaptive graphs for feature aggregation. However, we argue that such is still \textit{local} since rich cross-sequence relations have not been explicitly investigated. this paper, propose a contrastive learning framework recognition (\textit{SkeletonGCL}) explore \textit{global} across all sequences. specific, SkeletonGCL associates...
Smart wearable devices are becoming smaller, cheaper and popular. Smartwatch is one of the most popular devices. The smartwatch has rich applications such as messages, email voice by connecting to smartphone via Bluetooth. It hard interact with due small screen way it worn. Since usually equipped sensors like accelerometer gyroscope worn on wist, which makes possible identify user's gestures tracking movement finger, hand arm. Furthermore, user can control other nearby smart if they be...
To further improve the convenience and effectiveness of human computer interaction (HCI) with smart devices, activity recognition (HAR) has been widely studied from various aspects. Unfortunately, deep learning based methods often suffer either expensive labeling efforts or weak generalization ability. Inspired by recently developed domain adaptation strategies, we propose XHAR, a novel adversarial framework for HAR using providing better device user adaptation. XHAR first selects most...
An efficient resource management scheme is critical to enable network slicing in 5G networks and envisioned 6G networks, artificial intelligence (AI) techniques offer promising solutions. Considering the rapidly emerging new machine learning techniques, such as graph learning, federated transfer a timely survey needed provide an overview of AI-enabled wireless networks. This article provides along with application knowledge radio access (RAN) slicing. In particular, we first some background...
This paper presents IP-SLT, a simple yet effective framework for sign language translation (SLT). Our IP-SLT adopts recurrent structure and enhances the semantic representation (prototype) of input video via an iterative refinement manner. idea mimics behavior human reading, where sentence can be digested repeatedly, till reaching accurate understanding. Technically, consists feature extraction, prototype initialization, refinement. The initialization module generates initial based on visual...
Abstract Our study presents the assembly of a high-quality Taihu goose genome at Telomere-to-Telomere (T2T) level. By employing advanced sequencing technologies, including Pacific Biosciences HiFi reads, Oxford Nanopore long Illumina short and chromatin conformation capture (Hi-C), we achieved an exceptional assembly. The T2T encompasses total length 1,197,991,206 bp, with contigs N50 reaching 33,928,929 bp scaffold attaining 81,007,908 bp. It consists 73 scaffolds, 38 autosomes one pair Z/W...
Currently, masked language modeling (e.g., BERT) is the prime choice to learn contextualized representations. Due pervasiveness, it naturally raises an interesting question: how do models (MLMs) contextual representations? In this work, we analyze learning dynamics of MLMs and find that adopts sampled embeddings as anchors estimate inject semantics representations, which limits efficiency effectiveness MLMs. To address these problems, propose TACO, a simple yet effective representation...
Recent advancements indicate that scaling up Multimodal Large Language Models (MLLMs) effectively enhances performance on downstream multimodal tasks. The prevailing MLLM paradigm, \emph{e.g.}, LLaVA, transforms visual features into text-like tokens using a \emph{static} vision-language mapper, thereby enabling LLMs to develop the capability comprehend information through instruction tuning. Although promising, tuning strategy~\footnote{The static refers trained model with parameters.}...
Drug design is a crucial step in the drug discovery cycle. Recently, various deep learning-based methods drugs by generating novel molecules from scratch, avoiding traversing large-scale libraries. However, they depend on scarce experimental data or time-consuming docking simulation, leading to overfitting issues with limited training and slow generation speed. In this study, we propose zero-shot method DESERT (Drug dEsign SkEtching geneRaTing). Specifically, splits process into two stages:...
Abstract The Alectoris Chukar (chukar) is the most geographically widespread partridge species in world, demonstrating exceptional adaptability to diverse ecological environments. However, scarcity of genetic resources for chukar has hindered research into its adaptive evolution and molecular breeding. In this study, we have sequenced assembled a high-quality, phased genome that consists 31 pairs relatively complete diploid chromosomes. Our BUSCO analysis reported high completeness score...
Dynamic spectrum reallocation, under which the owners temporarily share underutilized to secondary users for economic profit, is an important approach improve utilization ratio. Auction believed be a natural marketing tool incentivize owners, and thus redistribute idle efficiently. Extensive researches have been done in problem of truthful auction, bidders bid based on their true valuations spectrum. The valuation individual bidder, however, private information should protected against...