- Music and Audio Processing
- Video Analysis and Summarization
- Speech and Audio Processing
- Advanced Image and Video Retrieval Techniques
- Music Technology and Sound Studies
- Speech Recognition and Synthesis
- Advanced Vision and Imaging
- Image Retrieval and Classification Techniques
- Multimedia Communication and Technology
- Architecture and Art History Studies
- Augmented Reality Applications
- Speech and dialogue systems
- Neuroscience and Music Perception
- Renaissance and Early Modern Studies
- Interactive and Immersive Displays
- Advanced Text Analysis Techniques
- Architecture and Computational Design
- Computer Graphics and Visualization Techniques
- Architecture, Modernity, and Design
- Video Coding and Compression Technologies
- Advanced Data Compression Techniques
- 3D Surveying and Cultural Heritage
- Algorithms and Data Compression
- Recommender Systems and Techniques
- Natural Language Processing Techniques
Pomona College
2024
Aarhus School of Architecture
2016-2023
Virginia Tech
2012
FX Palo Alto Laboratory
1998-2010
Fuji Xerox (Japan)
2005
Xerox (France)
1999-2002
University of Cambridge
1995-2002
Xerox (United States)
2002
National University of Singapore
1997-1999
Brown University
1991-1994
Though many systems exist for content-based retrieval of images, little work has been done on the audio portion multimedia stream. This paper presents a system to retrieve documents y acoustic similarity. The similarity measure is based statistics derived from supervised vector quantizer, rather than matching simple pitch or spectral characteristics. thus able learn distinguishing features while ignoring unimportant variation. Both theoretical and experimental results are presented,...
The paper describes methods for automatically locating points of significant change in music or audio, by analyzing local self-similarity. This method can find individual note boundaries even natural segment such as verse/chorus speech/music transitions, the absence cues silence. approach uses signal to model itself, and thus does not rely on particular acoustic nor requires training. We present a wide variety applications, including indexing, segmenting, beat tracking audio. works well...
A significant new speech corpus of British English has been recorded at Cambridge University. Derived from the Wall Street Journal text corpus, WSJCAMO constitutes one largest corpora spoken currently in existence. It specifically designed for construction and evaluation speaker-independent recognition systems. The database consists 140 speakers each speaking about 110 utterances. This paper describes motivation processes undertaken its utilities needed as support tools. All utterance...
This paper presents a novel approach to visualizing the time structure of music and audio. The acoustic similarity between any two instants an audio recording is displayed in 2D representation, allowing identification structural rhythmic characteristics. Examples are presented for classical popular music. Applications include content-based analysis segmentation, as well tempo extraction.
This paper presents methods for automatically creating pictorial video summaries that resemble comic books. The relative importance of segments is computed from their length and novelty. Image audio analysis used to detect emphasize meaningful events. Based on this measure, we choose relevant keyframes. Selected keyframes are sized by importance, then efficiently packed into a summary. We present quantitative measure how well summary captures the salient events in video, show it can be...
Organizing digital photograph collections according to events such as holiday gatherings or vacations is a common practice among photographers. To support photographers in this task, we present similarity-based methods cluster photos by time and image content. The approach general unsupervised, makes minimal assumptions regarding the structure statistics of photo collection. We several variants an automatic unsupervised algorithm partition collection photographs based either on temporal...
We introduce the beat spectrum, a new method of automatically characterizing rhythm and tempo music audio. The spectrum is measure acoustic self-similarity as function time lag. Highly structured or repetitive will have strong peaks at repetition times. This reveals both relative strength particular beats, therefore can distinguish between different kinds rhythms same tempo. also spectrogram which graphically illustrates variation over time. Unlike previous approaches to analysis, does not...
Article Free Access Share on A semi-automatic approach to home video editing Authors: Andreas Girgensohn FX Palo Alto Laboratory, 3400 Hillview Avenue, Alto, CA CAView Profile , John Boreczky Patrick Chiu Doherty Jonathan Foote Gene Golovchinsky Shingo Uchihashi Lynn Wilcox Authors Info & Claims UIST '00: Proceedings of the 13th annual ACM symposium User interface software and technologyNovember 2000Pages 81–89https://doi.org/10.1145/354401.354415Published:01 November 2000Publication History...
We present a framework for analyzing the structure of digital media streams. Though our methods work video, text, and audio, we concentrate on detecting music files. In first step, spectral data is used to construct similarity matrix calculated from inter-frame similarity.The audio can be robustly segmented by correlating kernel along diagonal matrix. Once segmented, statistics each segment are computed. second step,segments clustered based self-similarity their statistics. This reveals in...
Article Free Access Share on Retrieving spoken documents by combining multiple index sources Authors: G. J. F. Jones Computer Laboratory, University of Cambridge, New Museums Site, Pembroke Street, Cambridge CB2 3QG, England EnglandView Profile , T. Foote View K. Spärck S. Young Engineering Department, Trumpington CB2, 1PZ, Authors Info & Claims SIGIR '96: Proceedings the 19th annual international ACM conference Research and development in information retrievalAugust 1996Pages...
We present a framework for summarizing digital media based on structural analysis. Though these methods are applicable to general media, we concentrate here characterizing the repetitive structure in popular music. In first step, similarity matrix is calculated from interframe spectral similarity. Segment boundaries, such as verse-chorus transitions, found by correlating kernel along diagonal of matrix. Once segmented, statistics each segment computed. second segments clustered, pairwise...
We present a novel approach to automatically extracting summary excerpts from audio video and video. Our is maximize the average similarity between excerpt source. first calculate matrix by comparing each pair of time samples using quantitative measure. To determine segment with highest similarity, we summation self-similarity over support segment. select multiple while avoiding redundancy, compute non-negative factorization (NMF) into its essential structural components. then build...
We describe computationally and materially inexpensive methods for panoramic video imaging. Digitally combining images from an array of cameras results in a wide-field camera, off-the-shelf hardware. present that both correct lens distortion seamlessly merge into image. Electronically selecting region this rapidly steerable "virtual camera". Because the camera is fixed with respect to background, simple motion analysis can be used track objects people interest. algorithms automatic control...
We present a system for automatically extracting the region of interest and controlling virtual cameras control based on panoramic video.It targets applications such as classroom lectures video conferencing.For capturing video, we use FlyCam that produces high resolution, wide-angle by stitching images from multiple stationary cameras.To generate conventional (ROI) can be cropped video.We propose methods ROI detection, tracking, camera work in both uncompressed compressed domains.The is...
Article Free Access Share on Open-vocabulary speech indexing for voice and video mail retrieval Authors: M. G. Brown Olivetti Research Limited, 24a Trumpington St., Cambridge, CB2 1QA, UK UKView Profile , J. T. Foote Cambridge University Engineering Department, 1PZ, F. Jones Computer Laboratory, 3QG, K. Spärck View S. Young Authors Info & Claims MULTIMEDIA '96: Proceedings of the fourth ACM international conference MultimediaFebruary 1997 Pages...
We present methods for automatic and semi-automatic creation of music videos, given an arbitrary audio soundtrack source video. Significant changes are automatically detected; similarly, the video is segmented analyzed suitability based on camera motion exposure. Video with excessive or poor contrast penalized a high unsuitability score, more likely to be discarded in final edit. High quality clips then selected aligned time significant changes. adjusted match segments by selecting most...
We present similarity-based methods to cluster digital photos by time and image content. The approach is general, unsupervised, makes minimal assumptions regarding the structure or statistics of photo collection. results for algorithm based solely on temporal similarity, jointly content-based similarity. also describe a supervised learning vector quantization. Finally, we include experimental proposed algorithms several competing approaches two test collections.
FlySPEC is a video camera system designed for real-time remote operation. A hybrid design combines the high resolution of an optomechanical with wide field view always available from panoramic camera. The control integrates requests multiple users so that each controls virtual seamlessly manual and fully automatic control. It supports range options untended to full can also learn strategies user requests. Additionally, intuitive interface, objects are never out regardless zoom factor. We...
A convenient representation of a video segment is single "keyframe". Keyframes are widely used in applications such as non-linear browsing and editing. With existing methods keyframe selection, similar segments result very keyframes, with the drawback that actual differences between may be obscured. We present for selection based on two criteria: capturing similarity to represented segment, preserving from other so different will have visually distinct representations. discriminative...