- Speech and Audio Processing
- Advanced Vision and Imaging
- Natural Language Processing Techniques
- Speech Recognition and Synthesis
- Video Coding and Compression Technologies
- Advanced Data Compression Techniques
- Advanced Image Processing Techniques
Shanghai Jiao Tong University
2025
Google (United States)
2020-2021
University of California, Santa Barbara
2020
The AV1 video compression format is developed by the Alliance for Open Media consortium. It achieves more than a 30% reduction in bit rate compared to its predecessor VP9 same decoded quality. This article provides technical overview of codec design that enables performance gains with considerations hardware feasibility.
This paper proposes a novel bi-directional motion compensation framework that extracts existing information associated with the reference frames and interpolates an additional frame candidate is co-located current frame. The approach generates dense field by performing optical flow estimation, so as to capture complex between without recourse side information. estimated then complemented transmission of offset vectors correct for possible deviation from linearity assumption in interpolation....
The auto-regressive architecture, like GPTs, is widely used in modern Text-to-Speech (TTS) systems. However, it incurs substantial inference time, particularly due to the challenges next-token prediction posed by lengthy sequences of speech tokens. In this work, we introduce VADUSA, one first approaches accelerate TTS through speculative decoding. Our results show that VADUSA not only significantly improves speed but also enhances performance incorporating draft heads predict future content...
The AV1 video compression format is developed by the Alliance for Open Media consortium. It achieves more than 30% reduction in bit-rate compared to its predecessor VP9 same decoded quality. This paper provides a technical overview of codec design that enables performance gains with considerations hardware feasibility.