- Advanced Image and Video Retrieval Techniques
- Video Analysis and Summarization
- Multimodal Machine Learning Applications
- Image Retrieval and Classification Techniques
- Domain Adaptation and Few-Shot Learning
- Human Pose and Action Recognition
- Advanced Neural Network Applications
- Music and Audio Processing
- Sports Analytics and Performance
- Topic Modeling
- Face recognition and analysis
- Machine Learning and Data Classification
- Biomedical Text Mining and Ontologies
- Advanced Chemical Sensor Technologies
- Natural Language Processing Techniques
- Anomaly Detection Techniques and Applications
- Adversarial Robustness in Machine Learning
- Handwritten Text Recognition Techniques
- Nutritional Studies and Diet
- Data Quality and Management
- Face and Expression Recognition
- Software Engineering Research
- Names, Identity, and Discrimination Research
- Mental Health via Writing
- Authorship Attribution and Profiling
IBM (United States)
2013-2024
IBM Research - Thomas J. Watson Research Center
2016-2017
Columbia University
2009-2013
University of Trento
2007
University of California, San Diego
2006
Code translation aims to convert source code from one programming language (PL) another. Given the promising abilities of large models (LLMs) in synthesis, researchers are exploring their potential automate translation. The prerequisite for advancing state LLM-based is understand promises and limitations over existing techniques. To that end, we present a large-scale empirical study investigate ability general LLMs across pairs different languages, including C, C++, Go, Java, Python. Our...
The problem of using pictures objects captured under ideal imaging conditions (here referred to as in vitro) recognize natural environments (in situ) is an emerging area interest computer vision and pattern recognition.Examples tasks this vein include assistive systems for the blind object recognition mobile robots; proliferation image databases on web bound lead more examples near future.Despite its importance, there still a need freely available database facilitate study kind...
We propose semantic model vectors, an intermediate level representation, as a basis for modeling and detecting complex events in unconstrained real-world videos, such those from YouTube. The vectors are extracted using set of discriminative classifiers, each being ensemble SVM models trained thousands labeled web images, total 280 generic concepts. Our study reveals that the proposed representation outperforms-and is complementary to-other low-level visual descriptors video event modeling....
We propose a visual food recognition framework that integrates the inherent semantic relationships among fine-grained classes. Our method learns semantics-aware features by formulating multi-task loss function on top of convolutional neural network (CNN) architecture. It then refines CNN predictions using random walk based smoothing procedure, which further exploits rich information. evaluate our algorithm large "food-in-the-wild" benchmark, as well challenging dataset restaurant dishes with...
The production of sports highlight packages summarizing a game's most exciting moments is an essential task for broadcast media. Yet, it requires labor-intensive video editing. We propose novel approach auto-curating highlights, and demonstrate to create first kind, real-world system the editorial aid golf tennis reels. Our method fuses information from players' reactions (action recognition such as high-fives fist pumps), expressions (aggressive, tense, smiling, neutral), spectators (crowd...
Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, LLMs being integrated into environments to improve productivity of human programmers, and LLM-based agents beginning show promise for handling complex tasks autonomously. Realizing full potential requires a wide range capabilities, including generation, fixing bugs, explaining documenting code, maintaining repositories, more. In this work, we introduce Granite series decoder-only...
We propose a method to extract user attributes from the pictures posted in social media feeds, specifically gender information. While traditional approaches rely on text analysis or exploit visual information only profile picture colors, we look at distribution of semantics coming whole feed person estimate gender. In order compute such semantic distribution, trained models existing taxonomies recognize objects, scenes and activities, applied them images each user's feed. Experiments...
We present a system to assist users in dietary logging habits, which performs food recognition from pictures snapped on their phone two different scenarios. In the first scenario, called "Food context", we exploit GPS information of user determine restaurant they are having meal at, therefore restricting categories recognize set items menu. Such context allows us also report precise calories about meal, since chains tend standardize portions and provide each meal. second "Foods wild" try...
Face recognition is a long standing challenge in the field of Artificial Intelligence (AI). The goal to create systems that accurately detect, recognize, verify, and understand human faces. There are significant technical hurdles making these accurate, particularly unconstrained settings due confounding factors related pose, resolution, illumination, occlusion, viewpoint. However, with recent advances neural networks, face has achieved unprecedented accuracy, largely built on data-driven...
With the rapid growth of multimedia data, it becomes increasingly important to develop semantic concept modeling approaches that are consistently effective, highly efficient, and easily scalable. To this end, we first propose robust subspace bagging (RB-SBag) algorithm by augmenting random with forward model selection. Compared traditional approaches, RB-SBag offers a considerably faster learning process while minimizing risk overfitting. Its ensemble structure also enables convenient...
In this work, we study the performance of a two-stage ensemble visual machine learning framework for classification medical images. first stage, models are built subsets features and data, in second combined. We demonstrate four contexts: 1) The public ImageCLEF (Cross Language Evaluation Forum) 2013 modality recognition benchmark, 2) echocardiography view mode recognition, 3) dermatology disease across two datasets, 4) broad image dataset, merged from multiple data sources into collection...
Attribute-based representation has been widely used in visual recognition and retrieval due to its interpretability cross-category generalization properties. However, classic attribute learning requires manually labeling attributes on the images, which is very expensive, not scalable. In this paper, we propose model from category-attribute proportions. The proposed framework can without labels images. Specifically, given a multi-class image datasets with N categories, an attribute, based...
The production of sports highlight packages summarizing a game's most exciting moments is an essential task for broadcast media. Yet, it requires labor-intensive video editing. We propose novel approach auto-curating highlights, and use to create real-world system the editorial aid golf reels. Our method fuses information from players' reactions (action recognition such as high-fives fist pumps), spectators (crowd cheering), commentator (tone voice word analysis) determine interesting game....
We propose a fully automatic method for summarizing and indexing unstructured presentation videos based on text extracted from the projected slides. use changes of in slides as means to segment video into semantic shots. Unlike precedent approaches, our does not depend availability electronic source slides, but rather extracts recognizes directly video. Once regions are detected within keyframes, novel binarization algorithm, Local Adaptive Otsu (LOA), is employed deal with low quality scene...
Action recognition is an important problem in computer vision and has received substantial attention recent years. However, it remains very challenging due to the complex interaction of static dynamic information, as well high computational cost processing video data. This paper aims apply success image semantic domain, by leveraging both motion based descriptors different stages ladder. We examine effects three types features: low-level descriptors, intermediate-level deep architecture...
Code translation aims to convert source code from one programming language (PL) another. Given the promising abilities of large models (LLMs) in synthesis, researchers are exploring their potential automate translation. The prerequisite for advancing state LLM-based is understand promises and limitations over existing techniques. To that end, we present a large-scale empirical study investigate ability general LLMs across pairs different languages, including C, C++, Go, Java, Python. Our...
Ranking large scale image and video collections usually expects higher accuracy on top ranked data, while tolerates lower bottom ones. In view of this, we propose a rank learning algorithm, called Imbalanced RankBoost, which merges RankBoost iterative thresholding into unified loss optimization framework. The proposed approach provides more efficient ranking process by iteratively identifying cutoff threshold in each boosting iteration, automatically truncating feature computation for the...