Jonathan Foote

ORCID: 0000-0003-4411-1362
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Music and Audio Processing
  • Video Analysis and Summarization
  • Speech and Audio Processing
  • Advanced Image and Video Retrieval Techniques
  • Music Technology and Sound Studies
  • Speech Recognition and Synthesis
  • Advanced Vision and Imaging
  • Image Retrieval and Classification Techniques
  • Multimedia Communication and Technology
  • Architecture and Art History Studies
  • Augmented Reality Applications
  • Speech and dialogue systems
  • Neuroscience and Music Perception
  • Renaissance and Early Modern Studies
  • Interactive and Immersive Displays
  • Advanced Text Analysis Techniques
  • Architecture and Computational Design
  • Computer Graphics and Visualization Techniques
  • Architecture, Modernity, and Design
  • Video Coding and Compression Technologies
  • Advanced Data Compression Techniques
  • 3D Surveying and Cultural Heritage
  • Algorithms and Data Compression
  • Recommender Systems and Techniques
  • Natural Language Processing Techniques

Pomona College
2024

Aarhus School of Architecture
2016-2023

Virginia Tech
2012

FX Palo Alto Laboratory
1998-2010

Fuji Xerox (Japan)
2005

Xerox (France)
1999-2002

University of Cambridge
1995-2002

Xerox (United States)
2002

National University of Singapore
1997-1999

Brown University
1991-1994

Though many systems exist for content-based retrieval of images, little work has been done on the audio portion multimedia stream. This paper presents a system to retrieve documents y acoustic similarity. The similarity measure is based statistics derived from supervised vector quantizer, rather than matching simple pitch or spectral characteristics. thus able learn distinguishing features while ignoring unimportant variation. Both theoretical and experimental results are presented,...

10.1117/12.290336 article EN Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE 1997-10-06

The paper describes methods for automatically locating points of significant change in music or audio, by analyzing local self-similarity. This method can find individual note boundaries even natural segment such as verse/chorus speech/music transitions, the absence cues silence. approach uses signal to model itself, and thus does not rely on particular acoustic nor requires training. We present a wide variety applications, including indexing, segmenting, beat tracking audio. works well...

10.1109/icme.2000.869637 article EN 2002-11-07

A significant new speech corpus of British English has been recorded at Cambridge University. Derived from the Wall Street Journal text corpus, WSJCAMO constitutes one largest corpora spoken currently in existence. It specifically designed for construction and evaluation speaker-independent recognition systems. The database consists 140 speakers each speaking about 110 utterances. This paper describes motivation processes undertaken its utilities needed as support tools. All utterance...

10.1109/icassp.1995.479278 article EN International Conference on Acoustics, Speech, and Signal Processing 2002-11-19

10.1007/s005300050106 article EN Multimedia Systems 1999-01-01

This paper presents a novel approach to visualizing the time structure of music and audio. The acoustic similarity between any two instants an audio recording is displayed in 2D representation, allowing identification structural rhythmic characteristics. Examples are presented for classical popular music. Applications include content-based analysis segmentation, as well tempo extraction.

10.1145/319463.319472 article EN 1999-10-30

This paper presents methods for automatically creating pictorial video summaries that resemble comic books. The relative importance of segments is computed from their length and novelty. Image audio analysis used to detect emphasize meaningful events. Based on this measure, we choose relevant keyframes. Selected keyframes are sized by importance, then efficiently packed into a summary. We present quantitative measure how well summary captures the salient events in video, show it can be...

10.1145/319463.319654 article EN 1999-10-30

Organizing digital photograph collections according to events such as holiday gatherings or vacations is a common practice among photographers. To support photographers in this task, we present similarity-based methods cluster photos by time and image content. The approach general unsupervised, makes minimal assumptions regarding the structure statistics of photo collection. We several variants an automatic unsupervised algorithm partition collection photographs based either on temporal...

10.1145/1083314.1083317 article EN ACM Transactions on Multimedia Computing Communications and Applications 2005-08-01

We introduce the beat spectrum, a new method of automatically characterizing rhythm and tempo music audio. The spectrum is measure acoustic self-similarity as function time lag. Highly structured or repetitive will have strong peaks at repetition times. This reveals both relative strength particular beats, therefore can distinguish between different kinds rhythms same tempo. also spectrogram which graphically illustrates variation over time. Unlike previous approaches to analysis, does not...

10.1109/icme.2001.1237863 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2001-01-01

Article Free Access Share on A semi-automatic approach to home video editing Authors: Andreas Girgensohn FX Palo Alto Laboratory, 3400 Hillview Avenue, Alto, CA CAView Profile , John Boreczky Patrick Chiu Doherty Jonathan Foote Gene Golovchinsky Shingo Uchihashi Lynn Wilcox Authors Info & Claims UIST '00: Proceedings of the 13th annual ACM symposium User interface software and technologyNovember 2000Pages 81–89https://doi.org/10.1145/354401.354415Published:01 November 2000Publication History...

10.1145/354401.354415 article EN 2000-01-01

We present a framework for analyzing the structure of digital media streams. Though our methods work video, text, and audio, we concentrate on detecting music files. In first step, spectral data is used to construct similarity matrix calculated from inter-frame similarity.The audio can be robustly segmented by correlating kernel along diagonal matrix. Once segmented, statistics each segment are computed. second step,segments clustered based self-similarity their statistics. This reveals in...

10.1117/12.476302 article EN Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE 2003-01-20

Article Free Access Share on Retrieving spoken documents by combining multiple index sources Authors: G. J. F. Jones Computer Laboratory, University of Cambridge, New Museums Site, Pembroke Street, Cambridge CB2 3QG, England EnglandView Profile , T. Foote View K. Spärck S. Young Engineering Department, Trumpington CB2, 1PZ, Authors Info & Claims SIGIR '96: Proceedings the 19th annual international ACM conference Research and development in information retrievalAugust 1996Pages...

10.1145/243199.243208 article EN Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02 1996-01-01

We present a framework for summarizing digital media based on structural analysis. Though these methods are applicable to general media, we concentrate here characterizing the repetitive structure in popular music. In first step, similarity matrix is calculated from interframe spectral similarity. Segment boundaries, such as verse-chorus transitions, found by correlating kernel along diagonal of matrix. Once segmented, statistics each segment computed. second segments clustered, pairwise...

10.1109/aspaa.2003.1285836 article EN 2004-05-06

We present a novel approach to automatically extracting summary excerpts from audio video and video. Our is maximize the average similarity between excerpt source. first calculate matrix by comparing each pair of time samples using quantitative measure. To determine segment with highest similarity, we summation self-similarity over support segment. select multiple while avoiding redundancy, compute non-negative factorization (NMF) into its essential structural components. then build...

10.1109/mmsp.2002.1203239 article EN 2004-01-23

We describe computationally and materially inexpensive methods for panoramic video imaging. Digitally combining images from an array of cameras results in a wide-field camera, off-the-shelf hardware. present that both correct lens distortion seamlessly merge into image. Electronically selecting region this rapidly steerable "virtual camera". Because the camera is fixed with respect to background, simple motion analysis can be used track objects people interest. algorithms automatic control...

10.1109/icme.2000.871033 article EN 2002-11-07

We present a system for automatically extracting the region of interest and controlling virtual cameras control based on panoramic video.It targets applications such as classroom lectures video conferencing.For capturing video, we use FlyCam that produces high resolution, wide-angle by stitching images from multiple stationary cameras.To generate conventional (ROI) can be cropped video.We propose methods ROI detection, tracking, camera work in both uncompressed compressed domains.The is...

10.1109/tmm.2005.854388 article EN IEEE Transactions on Multimedia 2005-09-20

Article Free Access Share on Open-vocabulary speech indexing for voice and video mail retrieval Authors: M. G. Brown Olivetti Research Limited, 24a Trumpington St., Cambridge, CB2 1QA, UK UKView Profile , J. T. Foote Cambridge University Engineering Department, 1PZ, F. Jones Computer Laboratory, 3QG, K. Spärck View S. Young Authors Info & Claims MULTIMEDIA '96: Proceedings of the fourth ACM international conference MultimediaFebruary 1997 Pages...

10.1145/244130.244232 article EN 1996-01-01

We present methods for automatic and semi-automatic creation of music videos, given an arbitrary audio soundtrack source video. Significant changes are automatically detected; similarly, the video is segmented analyzed suitability based on camera motion exposure. Video with excessive or poor contrast penalized a high unsuitability score, more likely to be discarded in final edit. High quality clips then selected aligned time significant changes. adjusted match segments by selecting most...

10.1145/641007.641119 article EN 2002-12-01

We present similarity-based methods to cluster digital photos by time and image content. The approach is general, unsupervised, makes minimal assumptions regarding the structure or statistics of photo collection. results for algorithm based solely on temporal similarity, jointly content-based similarity. also describe a supervised learning vector quantization. Finally, we include experimental proposed algorithms several competing approaches two test collections.

10.1145/957013.957093 article EN 2003-11-02

FlySPEC is a video camera system designed for real-time remote operation. A hybrid design combines the high resolution of an optomechanical with wide field view always available from panoramic camera. The control integrates requests multiple users so that each controls virtual seamlessly manual and fully automatic control. It supports range options untended to full can also learn strategies user requests. Additionally, intuitive interface, objects are never out regardless zoom factor. We...

10.1145/641007.641110 article EN 2002-12-01

A convenient representation of a video segment is single "keyframe". Keyframes are widely used in applications such as non-linear browsing and editing. With existing methods keyframe selection, similar segments result very keyframes, with the drawback that actual differences between may be obscured. We present for selection based on two criteria: capturing similarity to represented segment, preserving from other so different will have visually distinct representations. discriminative...

10.1109/icme.2005.1521470 article EN 2005-10-24
Coming Soon ...