- Computer Graphics and Visualization Techniques
- Video Analysis and Summarization
- Advanced Vision and Imaging
- Data Visualization and Analytics
- Human Motion and Animation
- Interactive and Immersive Displays
- 3D Shape Modeling and Analysis
- Advanced Image and Video Retrieval Techniques
- Generative Adversarial Networks and Image Synthesis
- Image Enhancement Techniques
- Multimedia Communication and Technology
- Music and Audio Processing
- Human Pose and Action Recognition
- Usability and User Interface Design
- Tactile and Sensory Interactions
- 3D Surveying and Cultural Heritage
- Augmented Reality Applications
- Multimodal Machine Learning Applications
- Advanced Text Analysis Techniques
- Spatial Cognition and Navigation
- Mobile Crowdsensing and Crowdsourcing
- Design Education and Practice
- Visual Attention and Saliency Detection
- Music Technology and Sound Studies
- Visual perception and processing mechanisms
Stanford University
2015-2024
Microsoft Research (United Kingdom)
2003-2023
Palo Alto University
2015-2018
University of California, Berkeley
2006-2015
Berkeley College
2006-2014
Microsoft (United States)
2003-2006
University of Washington
2005
Microsoft (Finland)
2004
Chukyo University
2003
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur
2003
We present ControlNet, a neural network architecture to add spatial conditioning controls large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large models, and reuses their deep robust encoding layers with billions of images as strong backbone learn diverse set conditional controls. The is connected "zero convolutions" (zero-initialized convolution layers) that progressively grow parameters from zero ensure no harmful noise could affect finetuning. test...
Digital photography has made it possible to quickly and easily take a pair of images low-light environments: one with flash capture detail without ambient illumination. We present variety applications that analyze combine the strengths such flash/no-flash image pairs. Our include denoising transfer (to merge qualities no-flash high-frequency detail), white-balancing change color tone image), continuous interactively adjust intensity), red-eye removal repair artifacts in image). demonstrate...
We describe an interactive, computer-assisted framework for combining parts of a set photographs into single composite picture, process we call "digital photomontage." Our makes use two techniques primarily: graph-cut optimization, to choose good seams within the constituent images so that they can be combined as seamlessly possible; and gradient-domain fusion, based on Poisson equations, further reduce any remaining visible artifacts in composite. Also central is suite interactive tools...
Understanding how people explore immersive virtual environments is crucial for many applications, such as designing reality (VR) content, developing new compression algorithms, or learning computational models of saliency visual attention. Whereas a body recent work has focused on modeling in desktop viewing conditions, VR very different from these conditions that behavior governed by stereoscopic vision and the complex interaction head orientation, gaze, other kinematic constraints. To...
Abstract Efficient rendering of photo‐realistic virtual worlds is a long standing effort computer graphics. Modern graphics techniques have succeeded in synthesizing images from hand‐crafted scene representations. However, the automatic generation shape, materials, lighting, and other aspects scenes remains challenging problem that, if solved, would make more widely accessible. Concurrently, progress vision machine learning given rise to new approach image synthesis editing, namely deep...
We present an interactive method for cropping photographs given minimal information about important content location, provided by eye tracking. Cropping is formulated in a general optimization framework that facilitates adding new composition rules, and adapting the system to particular applications. Our uses fixation data</ identify image compute best crop any aspect ratio or size, enabling applications such as automatic snapshot recomposition, adaptive documents, thumbnailing. validate our...
We present an interactive system for efficiently extracting foreground objects from a video. extend previous min-cut based image segmentation techniques to the domain of video with four new contributions. provide novel painting-based user interface that allows users easily indicate object across space and time. introduce hierarchical mean-shift preprocess in order minimize number nodes must operate on. Within we also define local cost functions augment global costs defined earlier work....
Interactive history tools, ranging from basic undo and redo to branching timelines of user actions, facilitate iterative forms interaction. In this paper, we investigate the design mechanisms for information visualization. We present a space analysis both architectural interface issues, identifying decisions associated trade-offs. Based on analysis, contribute study graphical tools Tableau, database visualization system. These record visualize interaction histories, support data...
We investigate techniques for visualizing time series data and evaluate their effect in value comparison tasks. compare line charts with horizon graphs - a space-efficient visualization technique across range of chart sizes, measuring the speed accuracy subjects' estimates differences between charts. identify transition points at which reducing height results significantly differing drops estimation compared types, we find optimal positions speed-accuracy tradeoff curve viewers performed...
Editing talking-head video to change the speech content or remove filler words is challenging. We propose a novel method edit based on its transcript produce realistic output in which dialogue of speaker has been modified, while maintaining seamless audio-visual flow (i.e. no jump cuts). Our automatically annotates an input with phonemes, visemes, 3D face pose and geometry, reflectance, expression scene illumination per frame. To video, user only transcript, optimization strategy then...
Poorly designed charts are prevalent in reports, magazines, books and on the Web. Most of these only available as bitmap images; without access to underlying data it is prohibitively difficult for viewers create more effective visual representations. In response we present ReVision, a system that automatically redesigns visualizations improve graphical perception. Given image chart input, ReVision applies computer vision machine learning techniques identify type (e.g., pie chart, bar...
We present an interactive furniture layout system that assists users by suggesting arrangements are based on interior design guidelines. Our incorporates the guidelines as terms in a density function and generates suggestions rapidly sampling using hardware-accelerated Monte Carlo sampler. results demonstrate suggestion generation functionality measurably increases quality of produced participants with no prior training design.
We present an interactive furniture layout system that assists users by suggesting arrangements are based on interior design guidelines. Our incorporates the guidelines as terms in a density function and generates suggestions rapidly sampling using hardware-accelerated Monte Carlo sampler. results demonstrate suggestion generation functionality measurably increases quality of produced participants with no prior training design.
We present ControlNet, a neural network architecture to add spatial conditioning controls large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large models, and reuses their deep robust encoding layers with billions of images as strong backbone learn diverse set conditional controls. The is connected "zero convolutions" (zero-initialized convolution layers) that progressively grow parameters from zero ensure no harmful noise could affect finetuning. test...
Route maps, which depict a path from one location to another, have emerged as of the most popular applications on Web. Current computer-generated route however, are often very difficult use. In this paper we present set cartographic generalization techniques specifically designed improve usability maps. Our based both cognitive psychology research studying how maps used and an analysis generalizations commonly found in handdrawn We describe algorithmic implementations these within LineDrive,...
This paper presents scented widgets, graphical user interface controls enhanced with embedded visualizations that facilitate navigation in information spaces. We describe design guidelines for adding visual cues to common widgets such as radio buttons, sliders, and combo boxes contribute a general software framework applying within applications minimal modifications existing source code. provide number of example controlled experiment which finds users exploring unfamiliar data make up twice...
Article The two-user Responsive Workbench: support for collaboration through individual views of a shared space Share on Authors: Maneesh Agrawala Stanford University, Stanford, CA CAView Profile , Andrew C. Beers Ian McDowall Fakespace, Inc., Mountain View, Bernd Fröhlich Mark Bolas Pat Hanrahan Authors Info & Claims SIGGRAPH '97: Proceedings the 24th annual conference Computer graphics and interactive techniquesAugust 1997 Pages 327–332https://doi.org/10.1145/258734.258875Online:03 August...
We present a new image-based technique for enhancing the shape and surface details of an object. The input to our system is small set photographs taken from fixed viewpoint, but under varying lighting conditions. For each image we compute multiscale decomposition based on bilateral filter then reconstruct enhanced that combines detail information at scale across all images. Our approach does not require any about light source positions, or camera calibration, can produce good results with 3...
We present a new image-based technique for enhancing the shape and surface details of an object. The input to our system is small set photographs taken from fixed viewpoint, but under varying lighting conditions. For each image we compute multiscale decomposition based on bilateral filter then reconstruct enhanced that combines detail information at scale across all images. Our approach does not require any about light source positions, or camera calibration, can produce good results with 3...
Information visualization leverages the human visual system to support process of sensemaking, in which information is collected, organized, and analyzed generate knowledge inform action. Though most research date assumes a single-user focus on perceptual cognitive processes, practice, sensemaking often social involving parallelization effort, discussion, consensus building. This suggests that fully interactive should also interaction. However, appropriate collaboration mechanisms for...
We describe an interactive, computer-assisted framework for combining parts of a set photographs into single composite picture, process we call "digital photomontage." Our makes use two techniques primarily: graph-cut optimization, to choose good seams within the constituent images so that they can be combined as seamlessly possible; and gradient-domain fusion, based on Poisson equations, further reduce any remaining visible artifacts in composite. Also central is suite interactive tools...
Digital photography has made it possible to quickly and easily take a pair of images low-light environments: one with flash capture detail without ambient illumination. We present variety applications that analyze combine the strengths such flash/no-flash image pairs. Our include denoising transfer (to merge qualities no-flash high-frequency detail), white-balancing change color tone image), continuous interactively adjust intensity), red-eye removal repair artifacts in image). demonstrate...
We present an interactive system for generating photorealistic, textured, piecewise-planar 3D models of architectural structures and urban scenes from unordered sets photographs. To reconstruct geometry in our system, the user draws outlines overlaid on 2D The structure is then automatically computed by combining interaction with multi-view geometric information recovered performing motion analysis input utilize vanishing point constraints at multiple stages during reconstruction, which...
We present design principles for creating effective assembly instructions and a system that is based on these principles. The are drawn from cognitive psychology research which investigated people's conceptual models of methods to visually communicate information. Our inspired by earlier work in robotics planning visualization automated presentation design. Although other systems have considered independently, we believe it necessary address the two problems simultaneously order create...
Despite a diversity of software architectures supporting information visualization, it is often difficult to identify, evaluate, and re-apply the design solutions implemented within such frameworks. One popular effective approach for addressing difficulties capture successful in patterns, abstract descriptions interacting components that can be customized solve problems particular context. Based upon review existing frameworks our own experiences building visualization software, we present...