- Video Analysis and Summarization
- Advanced Image and Video Retrieval Techniques
- Image Retrieval and Classification Techniques
- Multimedia Communication and Technology
- Innovative Human-Technology Interaction
- Complex Network Analysis Techniques
- Data Visualization and Analytics
- Digital Games and Media
- Visual Attention and Saliency Detection
- Music and Audio Processing
- Interactive and Immersive Displays
- Multimodal Machine Learning Applications
- Geographic Information Systems Studies
- Scientific Computing and Data Management
- Human Mobility and Location-Based Analysis
- Virtual Reality Applications and Impacts
- Mobile Crowdsensing and Crowdsourcing
- Opinion Dynamics and Social Influence
- Digital Marketing and Social Media
- Augmented Reality Applications
- Context-Aware Activity Recognition Systems
- Artificial Intelligence in Games
- Ethics and Social Impacts of AI
- Big Data and Business Intelligence
- Misinformation and Its Impacts
Toyota Industries (United States)
2022-2025
Toyota Research Institute
2022-2025
Yahoo (United States)
2009-2023
Centrum Wiskunde & Informatica
2016-2022
Rochester Institute of Technology
2021
Yahoo (Spain)
2010-2021
FX Palo Alto Laboratory
2017-2020
Association for Computing Machinery
2020
College of Western Idaho
2017
Yahoo (United Kingdom)
2007-2015
Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive description and question answering. Cognition is core to that involve not just recognizing, but reasoning about our visual world. However, models used tackle the rich content images for are being trained using same datasets designed tasks. To achieve success at tasks, need understand interactions relationships between objects an image. When asked "What vehicle person riding?", will...
We present the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M), largest public multimedia collection that has ever been released. The dataset contains a total of million media objects, which approximately 99.2 are photos and 0.8 videos, all carry license. Each object in is represented by several pieces metadata, e.g. identifier, owner name, camera, title, tags, geo, source. provides comprehensive snapshot how videos were taken, described, shared over years, from inception 2004...
This paper develops a novel framework for semantic image retrieval based on the notion of scene graph. Our graphs represent objects ("man", "boat"), attributes ("boat is white") and relationships between ("man standing boat"). We use these as queries to retrieve semantically related images. To this end, we design conditional random field model that reasons about possible groundings test The likelihoods are used ranking scores retrieval. introduce dataset 5,000 human-generated grounded images...
Television broadcasters are beginning to combine social micro-blogging systems such as Twitter with television create video experiences around events. We looked at one event, the first U.S. presidential debate in 2008, conjunction aggregated ratings of message sentiment from Twitter. begin develop an analytical methodology and visual representations that could help a journalist or public affairs person better understand temporal dynamics reaction video. demonstrate visuals metrics can be...
Photos are becoming prominent means of communication online. Despite photos' pervasive presence in social media and online world, we know little about how people interact engage with their content. Understanding photo content might signify engagement, can impact both science design, influencing production distribution. One common type that is shared on media, the photos people. From studies offline behavior, human faces powerful channels non-verbal communication. In this paper, study...
We investigate the practice of sharing short messages (microblogging) around live media events. Our focus is on Twitter and its usage during 2008 Presidential Debates. find that analysis patterns this event can yield significant insights into semantic structure content object. Specifically, we level activity serves as a predictor changes in topics event. Further conversational cues identify key players object posts somewhat reflect discussion object, but are mostly evaluative, they express...
Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive description and question answering. Cognition is core to that involve not just recognizing, but reasoning about our visual world. However, models used tackle the rich content images for are being trained using same datasets designed tasks. To achieve success at tasks, need understand interactions relationships between objects an image. When asked "What vehicle person riding?", will...
A microblogged stream is delivered over time, providing an ongoing commentary of topics, trends, and issues. In this article, we present two methods finding temporal topics within these Twitter streams. Using a normalized term frequency, demonstrate how effective table contents can be extracted by localized "peaky topics". Second, find "persistent conversations" which have lower general salience but sustain persist the tweet corpus, in effect whispering conversation that lingers background....
How do people keep track of their money? In this paper we present a preliminary scoping study how 14 individuals in the San Francisco Bay Area earn, save, spend and understand money personal family finances. We describe practices developed for exploring sensitive topic money, then discuss three sets findings. The first is emotional component relationship have with Second, tools processes used to financial situation. Finally account unknown unpredictable nature future through decisions....
Virtual environments (VEs) can create collaborative and social spaces, which are increasingly important in the face of remote work travel reduction. Recent advances, such as more open widely available platforms, new possibilities to observe analyse interaction VEs. Using a custom instrumented build Mozilla Hubs measure position orientation, we conducted an academic workshop facilitate range typical activities. We analysed interactions during keynote, small group breakouts, informal...
Animated GIFs have been around since 1987 and recently gained more popularity on social networking sites. Tumblr, a large micro blogging platform, is popular venue to share animated GIFs. Tumblr users follow blogs, generating feed or posts, choose "like' "reblog' favored posts. In this paper, we use these actions as signals analyze the engagement of over 3.9 million conclude that are significantly engaging than other kinds media. We finding with deeper visual analysis nearly 100k pair our...
A variety of simple graphical filters are available to camera phone users enhance their photos on the fly; these often stylize, saturate or age a photo. In this paper, we present combination large-scale data analysis and small scale in-depth interviews understand filter-work. We look at producers’ practices photo filtering gain insights in roles play engaging consumers’ by driving social interactions. first interviewed 15 Flickr mobile app (photo producers) use perception filters. Next,...
In this article, we present a method for predicting the view count of YouTube video using small feature set collected from synchronous sharing tool. We hypothesize that videos which have high will exhibit unique pattern when shared in environments. Using one-day sample 2,188 dyadic sessions Yahoo! Zync tool, demonstrate how to predict video's on YouTube, specifically if has over 10 million views. The prediction model is 95.8% accurate and done with relatively training set; only 15% had more...
Behaviour in virtual environments might be informed by our experiences physical environments, but are not constrained the same physical, perceptual, or social cues. Instead of replicating properties spaces, one can create that diverge from reality dynamically manipulating environmental, aural, and properties. This paper explores digital proxemics, which describe how we use space presence others influences behaviours, interactions, movements. First, frame open challenges proxemics terms...
Social media sites are challenged by both the scale and variety of deviant behavior online. While algorithms can detect spam obscenity, behaviors that break community guidelines on some difficult because they have multimodal subtleties (images and/or text). Identifying these posts is often regulated to a few moderators. In this paper, we develop deep learning classifier jointly models textual visual characteristics pro-eating disorder content violates guidelines. Using million Tumblr photo...
Microtask crowdsourcing has enabled dataset advances in social science and machine learning, but existing schemes are too expensive to scale up with the expanding volume of data. To widen applicability crowdsourcing, we present a technique that produces extremely rapid judgments for binary categorical labels. Rather than punishing all errors, which causes workers proceed slowly deliberately, our speeds workers' point where errors acceptable even expected. We demonstrate it is possible...
We are facing increasingly pressure on reducing travel and working remotely. Tools that support effective remote communication collaboration much needed. Social Virtual Reality (VR) is an emerging medium, which invites multiple users to join a collaborative virtual environment (VE) has the potential in natural immersive way. successfully organized CHI 2020 VR workshop virtually Mozilla Hubs, invited researchers practitioners have fruitful discussion over user representations ethics,...
This paper presents a high-level overview of Yahoo Research Berkeley's approach to multimedia research and the ideas motivating it. is characterized primarily by shift away from building subsystems that attempt discover or understand "meaning" media content toward systems algorithms can usefully utilize information about how being used in specific contexts; semantics pragmatics. We believe that, at least for domain consumer web videos, latter provides more promising basis indexing ways...
Massive Open Online Course (MOOC) platforms have scaled online education to unprecedented enrollments, but remain limited by their rigid, predetermined curricula. To overcome this limitation, paper contributes a visual recommender system called MOOCex. The recommends lecture videos across different courses considering both video contents and sequential inter-topic relationships mined from course syllabi; more importantly, it allows for interactive exploration of the semantic space...
From ride-hailing to car rentals, consumers are often presented with eco-friendly options. Beyond highlighting a "green" vehicle and CO2 emissions, equivalencies have been designed provide understandable amounts; we ask which will lead decisions. We conducted five scenario surveys where participants picked between regular options, testing equivalencies, social features, valence-based interventions. Further, tested car-rental embodiment gauge how an individual (needing for several days) might...
The Wizard of Oz (WoZ) method is a widely adopted research approach where human ``role-plays'' not readily available technology and interacts with participants to elicit user behaviors probe the design space. With growing ability for modern large language models (LLMs) role-play, one can apply LLMs as Wizards in WoZ experiments better scalability lower cost than traditional approach. However, methodological guidance on responsibly applying systematic evaluation LLMs' role-playing are...