NFDI4DS | UHH-SEMS - Publication Details

Multimodal active speaker detection and virtual cinematography for video conferencing

Panning (audio) Cinematography Frame rate

DOI: 10.48550/arxiv.2002.03977 Publication Date: 2020-01-01

Abstract Supplemental Material References Cited by

AUTHORS (7)

Ross Cutler

Ramin Mehran

Sam Johnson

Cha Zhang

Adam Kirk

Oliver Whyte

Adarsh Kowdle

ABSTRACT

Active speaker detection (ASD) and virtual cinematography (VC) can significantly improve the remote user experience of a video conference by automatically panning, tilting zooming conferencing camera: users subjectively rate an expert cinematographer's higher than unedited video. We describe new automated ASD VC that performs within 0.3 MOS cinematographer based on subjective ratings with 1-5 scale. This system uses 4K wide-FOV camera, depth microphone array; it extracts features from each modality trains using AdaBoost machine learning is very efficient runs in real-time. A similarly trained to optimize quality overall experience. To avoid distracting room participants reduce switching latency has no moving parts -- works cropping stream. The was tuned evaluated extensive crowdsourcing techniques dataset N=100 meetings, 2-5 minutes length.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Multimodal active speaker detection and virtual cinematography for video conferencing

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....