- Image and Video Quality Assessment
- Video Coding and Compression Technologies
- Advanced Computational Techniques and Applications
- Wireless Networks and Protocols
- Multimedia Communication and Technology
- Sentiment Analysis and Opinion Mining
- Speech and Audio Processing
- Smart Grid and Power Systems
- Mobile Ad Hoc Networks
- Speech Recognition and Synthesis
- Topic Modeling
- Natural Language Processing Techniques
- Power Systems and Technologies
- Bluetooth and Wireless Communication Technologies
- Network Traffic and Congestion Control
- Evacuation and Crowd Dynamics
- Image Retrieval and Classification Techniques
- Technology and Security Systems
- Advanced Measurement and Detection Methods
- Industrial Technology and Control Systems
- Video Analysis and Summarization
- Complex Network Analysis Techniques
- Advanced Image Processing Techniques
- Caching and Content Delivery
- Language, Metaphor, and Cognition
Nationwide Children's Hospital
2024
Communication University of China
2005-2024
Yunnan University
2024
Shanghai Maritime University
2021
Samsung (China)
2020
IBM Research (China)
2020
Google (United States)
2018-2019
Capital Normal University
2013-2019
Shandong Normal University
2012-2017
Beijing University of Chemical Technology
2017
For a typical video distribution system, the contents are first compressed and then stored in local storage or transmitted to end users through networks. While videos error-prone networks, error robustness becomes an important issue. In past years, number of rate-distortion (R-D) optimized coding mode selection schemes have been proposed for error-resilient coding, including recursive optimal per-pixel estimate (ROPE) method. However, ROPE-related approaches assume integer-pixel...
Neural audio/speech coding has recently demonstrated its capability to deliver high quality at much lower bitrates than traditional methods. However, existing neural codecs employ either acoustic features or learned blind with a convolutional network for encoding, by which there are still temporal redundancies within encoded features. This article introduces latent-domain predictive into the VQ-VAE framework fully remove such and proposes TF-Codec low-latency speech in an end-to-end manner....
In the realm of 360° video streaming, how to deliver optimal viewing experience users with minimal bandwidth cost has become an emerging challenge. Our research is driven by a comprehensive analysis real-world user's head movement datasets video, revealing significant viewport shifts even during playback individual chunks. However, prevailing streaming algorithms fail account for such variations, thereby resulting in substantial degradation user experience. To this end, we propose OMMS,...
In the latest social networks, more and people prefer to express their emotions in videos through text, speech, rich facial expressions. Multimodal video emotion analysis techniques can help understand users' inner world automatically based on human expressions gestures images, tones voices, recognized natural language. However, existing research, acoustic modality has long been a marginal position as compared visual textual modalities. That is, it tends be difficult improve contribution of...
Web real-time communication (WebRTC) employs congestion control to ensure the quality of experience (QoE). Different from schemes for TCP, WebRTC keeps a low-level playback buffer that considers excessively delayed packets as losses, which makes more challenging. Existing heuristic estimate network conditions based on hand-crafted rules may be suboptimal, leading under-utilization or over-utilization link capacity in many cases. On other hand, existing learning-based train model acts large...
Recently, generative Text-based visual question answering (TextVQA) methods, which are often based on language models, have exhibited impressive results and drawn increasing attention. However, due to the inconsistencies in both input forms optimization objectives, power of pretrained models is not fully explored, resulting need for large amounts training data. In this work, we rethink characteristics TextVQA task find that scene text indeed a special kind embedded images. To end, propose...
Being an important application of spectrum sharing in cellular networks, mobile traffic offloading, which advocates third-party owners network resource on unlicensed/licensed to share their and provide data offloading services, is considered a promising solution severe shortage faced by service providers. In this paper, we consider general system that adopts the widely used Gale-Shapley algorithm optimize its phone users (MUs) stations allocation plan. We notice without careful protection,...
Jason Baldridge, Tania Bedrax-Weiss, Daphne Luong, Srini Narayanan, Bo Pang, Fernando Pereira, Radu Soricut, Michael Tseng, Yuan Zhang. Proceedings of the First International Workshop on Spatial Language Understanding. 2018.
Gateways are a crucial part of wireless networks. In multi-domain networks, the existing solution to problem optimal gateway selection is based on distributed learning. While such interesting and useful, it has fundamental limitation: learning algorithm may stay for long time in Nash Equilibrium that does not correspond selection. This can be so people hardly wait convergence this paper, we present systematic study problem. We distinguish three cases an alternative each these cases: public...
The O( 3 P)+ reaction has been investigated by employing time‐dependent quantum wave packet with split operator method on potential energy surface of the doublet ground‐state H 2 O + (1 A″). probabilities and integral cross sections are calculated using centrifugal sudden approximation, which basically agree quasi‐classical results Paniagua et al. [ Phys. Chem. Phys . 2014, 16, 23594]. Moreover, effect vibrational rotational excitation reactant is investigated. show that effects section not...
Deep reinforcement learning (DRL) has demonstrated remarkable potential within the domain of video adaptive bitrate (ABR) optimization. However, training a well-performing DRL agent in two-tier 360 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$^{\circ}$</tex-math> </inline-formula> streaming system is non-trivial. The conventional approach fails to enable model start from simpler environments and then...
The use of social media runs through our lives, and users' emotions are also affected by it. Previous studies have reported organizations psychologists using to find depressed patients. However, due the variety content published users, it isn't effortless for system consider text, image, even hidden information behind image. To address this problem, we proposed a new screening patients named BlueMemo. We collected real-time posts from Twitter. Based on posts, learned text features, image...
In this paper, the engineering practical method of risk probability is studied based on fault main equipment through setting factor and outage factor. The hazard assessment for network planning ACT No. 599 social influence loss load result can receive assessment. Making classification warning level all voltage power grid provide decision-making basis operation construction grid. methods applied in Shenzhen assessment, reasonable proved.
The rise of 4K and 8K techniques has led to the growth video data streaming. Consequently, greater challenges encryption efficiency information leakage facing selective (SE) makes it necessary reduce ratio as much possible. In this paper, we design a SE scheme for H.264/AVC which achieves trade-off between low high safety, both cryptographic attack sketch point view. As starting point, propose novel calculation model, called block weight model that can take advantage reference structure...
This paper examines the commonalities and variations between within groups of English Chinese (Mandarin) speakers in using terms to refer topological spatial concepts containment (expressed by related English) support on English). In addition crosslinguistic similarities, systematic differences use linguistic expressions Mandarin for these relationships were found, as well individual each language group. Together, findings point potential underlying how conceptualize two categories.
Time-variant factors including dynamic delay and varying echo path often occur in real-world acoustic cancellation (AEC) applications. Current end-to-end deep neural network (DNN) based methods usually model the time-variant components implicitly can hardly handle unpredictable time-variance real-time AEC. To explicitly capture components, we propose a kernel generation (DKG) module that be introduced as learnable plug-in to DNN-based pipeline. Specifically, DKG generates convolutional...