NFDI4DS | UHH-SEMS - Publication Details

Bohan Li

ORCID: 0000-0003-2285-9572

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5027650734

Research Areas

Speech and Audio Processing
Advanced Vision and Imaging
Natural Language Processing Techniques
Speech Recognition and Synthesis
Video Coding and Compression Technologies
Advanced Data Compression Techniques
Advanced Image Processing Techniques

Shanghai Jiao Tong University
2025

Google (United States)
2020-2021

University of California, Santa Barbara
2020

A Technical Overview of AV1

OPENALEX - Publications

Jingning Han Bohan Li Debargha Mukherjee Ching-Han Chiang Adrian Grange and 10 more

The AV1 video compression format is developed by the Alliance for Open Media consortium. It achieves more than a 30% reduction in bit rate compared to its predecessor VP9 same decoded quality. This article provides technical overview of codec design that enables performance gains with considerations hardware feasibility.

10.1109/jproc.2021.3058584 article EN cc-by Proceedings of the IEEE 2021-02-26

Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding

OPENALEX - Publications

Bohan Li Hankun Wang Situo Zhang Yiwei Guo Kaiping Yu

10.1109/icassp49660.2025.10888194 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Optical Flow Based Co-Located Reference Frame for Video Compression

OPENALEX - Publications

Bohan Li Jingning Han Yaowu Xu Kenneth Rose

This paper proposes a novel bi-directional motion compensation framework that extracts existing information associated with the reference frames and interpolates an additional frame candidate is co-located current frame. The approach generates dense field by performing optical flow estimation, so as to capture complex between without recourse side information. estimated then complemented transmission of offset vectors correct for possible deviation from linearity assumption in interpolation....

10.1109/tip.2020.3014723 article EN IEEE Transactions on Image Processing 2020-01-01

Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding

OPENALEX - Publications

Bohan Li Hankun Wang Situo Zhang Yiwei Guo Kaiping Yu

The auto-regressive architecture, like GPTs, is widely used in modern Text-to-Speech (TTS) systems. However, it incurs substantial inference time, particularly due to the challenges next-token prediction posed by lengthy sequences of speech tokens. In this work, we introduce VADUSA, one first approaches accelerate TTS through speculative decoding. Our results show that VADUSA not only significantly improves speed but also enhances performance incorporating draft heads predict future content...

10.48550/arxiv.2410.21951 preprint EN arXiv (Cornell University) 2024-10-29

A Technical Overview of AV1

OPENALEX - Publications

Jingning Han Bohan Li Debargha Mukherjee Ching-Han Chiang Adrian Grange and 10 more

The AV1 video compression format is developed by the Alliance for Open Media consortium. It achieves more than 30% reduction in bit-rate compared to its predecessor VP9 same decoded quality. This paper provides a technical overview of codec design that enables performance gains with considerations hardware feasibility.

10.48550/arxiv.2008.06091 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Coming Soon ...