NFDI4DS | UHH-SEMS - Publication Details

Kate Saenko

ORCID: 0000-0002-7564-7218

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5075906727

Research Areas

Multimodal Machine Learning Applications
Domain Adaptation and Few-Shot Learning
Human Pose and Action Recognition
Advanced Image and Video Retrieval Techniques
Advanced Neural Network Applications
Topic Modeling
Generative Adversarial Networks and Image Synthesis
Adversarial Robustness in Machine Learning
Natural Language Processing Techniques
Cancer-related molecular mechanisms research
Reinforcement Learning in Robotics
Video Analysis and Summarization
Explainable Artificial Intelligence (XAI)
Anomaly Detection Techniques and Applications
Video Surveillance and Tracking Methods
Advanced Vision and Imaging
COVID-19 diagnosis using AI
Image Retrieval and Classification Techniques
Speech and Audio Processing
Robotics and Sensor-Based Localization
Robot Manipulation and Learning
Remote-Sensing Image Classification
Music and Audio Processing
Speech Recognition and Synthesis
Speech and dialogue systems

IBM (United States)
2019-2024

Boston University
2015-2023

Massachusetts Institute of Technology
2004-2020

Adobe Systems (United States)
2020

Max Planck Society
2011-2020

Stanford University
2020

University of Illinois Urbana-Champaign
2019

University of Massachusetts Lowell
2013-2017

The University of Texas at Austin
2016

Laboratoire d'Informatique de Paris-Nord
2016

Long-term recurrent convolutional networks for visual recognition and description

OPENALEX - Publications

Jeff Donahue Lisa Anne Hendricks Sergio Guadarrama Marcus Rohrbach Subhashini Venugopalan and 2 more

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent architecture suitable large-scale learning is end-to-end trainable, demonstrate the value of these benchmark video recognition tasks, description retrieval problems, narration challenges. In contrast to current assume fixed...

10.1109/cvpr.2015.7298878 article EN 2015-06-01

Adversarial Discriminative Domain Adaptation

OPENALEX - Publications

Eric Tzeng Judy Hoffman Kate Saenko Trevor Darrell

Adversarial learning methods are a promising approach to training robust deep networks, and can generate complex samples across diverse domains. They also improve recognition despite the presence of domain shift or dataset bias: recent adversarial approaches unsupervised adaptation reduce difference between test distributions thus generalization performance. However, while generative networks (GANs) show compelling visualizations, they not optimal on discriminative tasks be limited smaller...

10.1109/cvpr.2017.316 article EN 2017-07-01

Deep Domain Confusion: Maximizing for Domain Invariance

OPENALEX - Publications

Eric Tzeng Judy Hoffman Ning Zhang Kate Saenko Trevor Darrell

Recent reports suggest that a generic supervised deep CNN model trained on large-scale dataset reduces, but does not remove, bias standard benchmark. Fine-tuning models in new domain can require significant amount of data, which for many applications is simply available. We propose architecture introduces an adaptation layer and additional confusion loss, to learn representation both semantically meaningful invariant. additionally show metric be used selection determine the dimension best...

10.48550/arxiv.1412.3474 preprint EN other-oa arXiv (Cornell University) 2014-01-01

CyCADA: Cycle-Consistent Adversarial Domain Adaptation

OPENALEX - Publications

Judy Hoffman Eric Tzeng Taesung Park Jun-Yan Zhu Phillip Isola and 3 more

Domain adaptation is critical for success in new, unseen environments. Adversarial models applied feature spaces discover domain invariant representations, but are difficult to visualize and sometimes fail capture pixel-level low-level shifts. Recent work has shown that generative adversarial networks combined with cycle-consistency constraints surprisingly effective at mapping images between domains, even without the use of aligned image pairs. We propose a novel discriminatively-trained...

10.48550/arxiv.1711.03213 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Return of Frustratingly Easy Domain Adaptation

OPENALEX - Publications

Baochen Sun Jiashi Feng Kate Saenko

Unlike human learning, machine learning often fails to handle changes between training (source) and test (target) input distributions. Such domain shifts, common in practical scenarios, severely damage the performance of conventional methods. Supervised adaptation methods have been proposed for case when target data labels, including some that perform very well despite being ``frustratingly easy'' implement. However, practice, is unlabeled, requiring unsupervised adaptation. We propose a...

10.1609/aaai.v30i1.10306 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2016-03-02

Long-Term Recurrent Convolutional Networks for Visual Recognition and Description

OPENALEX - Publications

Jeff Donahue Lisa Anne Hendricks Marcus Rohrbach Subhashini Venugopalan Sergio Guadarrama and 2 more

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent effective for tasks involving sequences, visual and otherwise. We describe a class of architectures is end-to-end trainable suitable large-scale understanding tasks, demonstrate the value these activity recognition, captioning, video description. In contrast to previous assume fixed representation or perform simple temporal averaging sequential...

10.1109/tpami.2016.2599174 article EN publisher-specific-oa IEEE Transactions on Pattern Analysis and Machine Intelligence 2016-09-01

Sequence to Sequence -- Video to Text

OPENALEX - Publications

Subhashini Venugopalan Marcus Rohrbach Jeffrey Donahue Raymond J. Mooney Trevor Darrell and 1 more

Real-world videos often have complex dynamics, methods for generating open-domain video descriptions should be sensitive to temporal structure and allow both input (sequence of frames) output words) variable length. To approach this problem we propose a novel end-to-end sequence-to-sequence model generate captions videos. For exploit recurrent neural networks, specifically LSTMs, which demonstrated state-of-the-art performance in image caption generation. Our LSTM is trained on...

10.1109/iccv.2015.515 article EN 2015-12-01

Moment Matching for Multi-Source Domain Adaptation

OPENALEX - Publications

Xingchao Peng Qinxun Bai Xide Xia Zijun Huang Kate Saenko and 1 more

Conventional unsupervised domain adaptation (UDA) assumes that training data are sampled from a single domain. This neglects the more practical scenario where collected multiple sources, requiring multi-source adaptation. We make three major contributions towards addressing this problem. First, we collect and annotate by far largest UDA dataset, called DomainNet, which contains six domains about 0.6 million images distributed among 345 categories, gap in availability for research. Second,...

10.1109/iccv.2019.00149 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Simultaneous Deep Transfer Across Domains and Tasks

OPENALEX - Publications

Eric Tzeng Judy Hoffman Trevor Darrell Kate Saenko

10.1109/iccv.2015.463 article EN 2015-12-01

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

OPENALEX - Publications

Jeff Donahue Lisa Anne Hendricks Sergio Guadarrama Marcus Rohrbach Subhashini Venugopalan and 2 more

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent effective for tasks involving sequences, visual and otherwise.We describe a class of architectures is end-to-end trainable suitable large-scale understanding tasks, demonstrate the value these activity recognition, captioning, video description.In contrast to previous assume fixed representation or perform simple temporal averaging sequential...

10.21236/ada623249 preprint EN 2014-11-17

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

OPENALEX - Publications

Subhashini Venugopalan Huijuan Xu Jeff Donahue Marcus Rohrbach Raymond J. Mooney and 1 more

Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko. Proceedings of the 2015 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2015.

10.3115/v1/n15-1173 article EN Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2015-01-01

R-C3D: Region Convolutional 3D Network for Temporal Activity Detection

OPENALEX - Publications

Huijuan Xu Abir Das Kate Saenko

We address the problem of activity detection in continuous, untrimmed video streams. This is a difficult task that requires extracting meaningful spatio-temporal features to capture activities, accurately localizing start and end times each activity. introduce new model, Region Convolutional 3D Network (R-C3D), which encodes streams using three-dimensional fully convolutional network, then generates candidate temporal regions containing finally classifies selected into specific activities....

10.1109/iccv.2017.617 article EN 2017-10-01

What you saw is not what you get: Domain adaptation using asymmetric kernel transforms

OPENALEX - Publications

Brian Kulis Kate Saenko Trevor Darrell

In real-world applications, "what you saw" during training is often not get" deployment: the distribution and even type dimensionality of features can change from one dataset to next. this paper, we address problem visual domain adaptation for transferring object models or another. We introduce ARC-t, a flexible model supervised learning non-linear transformations between domains. Our method based on novel theoretical result demonstrating that such be learned in kernel space. Unlike existing...

10.1109/cvpr.2011.5995702 article EN 2011-06-01

Strong-Weak Distribution Alignment for Adaptive Object Detection

OPENALEX - Publications

Kuniaki Saito Yoshitaka Ushiku Tatsuya Harada Kate Saenko

We propose an approach for unsupervised adaptation of object detectors from label-rich to label-poor domains which can significantly reduce annotation costs associated with detection. Recently, approaches that align distributions source and target images using adversarial loss have been proven effective adapting classifiers. However, detection, fully matching the entire each other at global image level may fail, as could distinct scene layouts different combinations objects. On hand, strong...

10.1109/cvpr.2019.00712 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

VisDA: The Visual Domain Adaptation Challenge

OPENALEX - Publications

Xingchao Peng Ben Usman Neela Kaushik Judy Hoffman Dequan Wang and 1 more

We present the 2017 Visual Domain Adaptation (VisDA) dataset and challenge, a large-scale testbed for unsupervised domain adaptation across visual domains. Unsupervised aims to solve real-world problem of shift, where machine learning models trained on one must be transferred adapted novel without additional supervision. The VisDA2017 challenge is focused simulation-to-reality shift has two associated tasks: image classification segmentation. goal in both tracks first train model simulated,...

10.48550/arxiv.1710.06924 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Natural Language Object Retrieval

OPENALEX - Publications

Ronghang Hu Huazhe Xu Marcus Rohrbach Jiashi Feng Kate Saenko and 1 more

In this paper, we address the task of natural language object retrieval, to localize a target within given image based on query object. Natural retrieval differs from text-based as it involves spatial information about objects scene and global context. To issue, propose novel Spatial Context Recurrent ConvNet (SCRC) model scoring function candidate boxes for integrating configurations scene-level contextual into network. Our processes text, local descriptors, context features through...

10.1109/cvpr.2016.493 preprint EN 2016-06-01

Semi-Supervised Domain Adaptation via Minimax Entropy

OPENALEX - Publications

Kuniaki Saito Donghyun Kim Stan Sclaroff Trevor Darrell Kate Saenko

Contemporary domain adaptation methods are very effective at aligning feature distributions of source and target domains without any supervision. However, we show that these techniques perform poorly when even a few labeled examples available in the domain. To address this semi-supervised (SSDA) setting, propose novel Minimax Entropy (MME) approach adversarially optimizes an adaptive few-shot model. Our base model consists encoding network, followed by classification layer computes features'...

10.1109/iccv.2019.00814 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition

OPENALEX - Publications

Sergio Guadarrama Niveda Krishnamoorthy Girish Malkarnenkar Subhashini Venugopalan Raymond J. Mooney and 2 more

Despite a recent push towards large-scale object recognition, activity recognition remains limited to narrow domains and small vocabularies of actions. In this paper, we tackle the challenge recognizing describing activities ``in-the-wild''. We present solution that takes short video clip outputs brief sentence sums up main in video, such as actor, action its object. Unlike previous work, our approach works on out-of-domain actions: it does not require training videos exact activity. If...

10.1109/iccv.2013.337 article EN 2013-12-01

Learning to Reason: End-to-End Module Networks for Visual Question Answering

OPENALEX - Publications

Ronghang Hu Jacob Andreas Marcus Rohrbach Trevor Darrell Kate Saenko

Natural language questions are inherently compositional, and many most easily answered by reasoning about their decomposition into modular sub-problems. For example, to answer “is there an equal number of balls boxes?” we can look for balls, boxes, count them, compare the results. The recently proposed Neural Module Network (NMN) architecture [3, 2] implements this approach question answering parsing linguistic substructures assembling question-specific deep networks from smaller modules...

10.1109/iccv.2017.93 article EN 2017-10-01

Grasp Pose Detection in Point Clouds

OPENALEX - Publications

Andreas ten Pas Marcus Gualtieri Kate Saenko Robert W. Platt

Recently, a number of grasp detection methods have been proposed that can be used to localize robotic configurations directly from sensor data without estimating object pose. The underlying idea is treat perception analogously in computer vision. These take as input noisy and partially occluded RGBD image or point cloud produce output pose estimates viable grasps, assuming known CAD model the object. Although these generalize knowledge new objects well, they not yet demonstrated reliable...

10.1177/0278364917735594 article EN The International Journal of Robotics Research 2017-10-30

RISE: Randomized Input Sampling for Explanation of Black-box Models

OPENALEX - Publications

Vitali Petsiuk Abir Das Kate Saenko

Deep neural networks are being used increasingly to automate data analysis and decision making, yet their decision-making process is largely unclear difficult explain the end users. In this paper, we address problem of Explainable AI for deep that take images as input output a class probability. We propose an approach called RISE generates importance map indicating how salient each pixel model's prediction. contrast white-box approaches estimate using gradients or other internal network...

10.48550/arxiv.1806.07421 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Coming Soon ...