NFDI4DS | UHH-SEMS - Publication Details

Michele Merler

ORCID: 0000-0002-4358-8671

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5068061267

Research Areas

Advanced Image and Video Retrieval Techniques
Video Analysis and Summarization
Multimodal Machine Learning Applications
Image Retrieval and Classification Techniques
Domain Adaptation and Few-Shot Learning
Human Pose and Action Recognition
Advanced Neural Network Applications
Music and Audio Processing
Sports Analytics and Performance
Topic Modeling
Face recognition and analysis
Machine Learning and Data Classification
Biomedical Text Mining and Ontologies
Advanced Chemical Sensor Technologies
Natural Language Processing Techniques
Anomaly Detection Techniques and Applications
Adversarial Robustness in Machine Learning
Handwritten Text Recognition Techniques
Nutritional Studies and Diet
Data Quality and Management
Face and Expression Recognition
Software Engineering Research
Names, Identity, and Discrimination Research
Mental Health via Writing
Authorship Attribution and Profiling

IBM (United States)
2013-2024

IBM Research - Thomas J. Watson Research Center
2016-2017

Columbia University
2009-2013

University of Trento
2007

University of California, San Diego
2006

Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code

OPENALEX - Publications

Rangeet Pan Ali Reza Ibrahimzada Rahul Krishna Divya Sankar Lambert Pouguem Wassi and 5 more

Code translation aims to convert source code from one programming language (PL) another. Given the promising abilities of large models (LLMs) in synthesis, researchers are exploring their potential automate translation. The prerequisite for advancing state LLM-based is understand promises and limitations over existing techniques. To that end, we present a large-scale empirical study investigate ability general LLMs across pairs different languages, including C, C++, Go, Java, Python. Our...

10.1145/3597503.3639226 article EN cc-by-nc 2024-04-12

Recognizing Groceries in situ Using in vitro Training Data

OPENALEX - Publications

Michele Merler Carolina Galleguillos Serge Belongie

The problem of using pictures objects captured under ideal imaging conditions (here referred to as in vitro) recognize natural environments (in situ) is an emerging area interest computer vision and pattern recognition.Examples tasks this vein include assistive systems for the blind object recognition mobile robots; proliferation image databases on web bound lead more examples near future.Despite its importance, there still a need freely available database facilitate study kind...

10.1109/cvpr.2007.383486 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2007-06-01

Semantic Model Vectors for Complex Video Event Recognition

OPENALEX - Publications

Michele Merler Bert Huang Lexing Xie Gang Hua Apostol Natsev

We propose semantic model vectors, an intermediate level representation, as a basis for modeling and detecting complex events in unconstrained real-world videos, such those from YouTube. The vectors are extracted using set of discriminative classifiers, each being ensemble SVM models trained thousands labeled web images, total 280 generic concepts. Our study reveals that the proposed representation outperforms-and is complementary to-other low-level visual descriptors video event modeling....

10.1109/tmm.2011.2168948 article EN IEEE Transactions on Multimedia 2011-09-28

Learning to Make Better Mistakes

OPENALEX - Publications

Hui Wu Michele Merler Rosario Uceda‐Sosa John R. Smith

We propose a visual food recognition framework that integrates the inherent semantic relationships among fine-grained classes. Our method learns semantics-aware features by formulating multi-task loss function on top of convolutional neural network (CNN) architecture. It then refines CNN predictions using random walk based smoothing procedure, which further exploits rich information. evaluate our algorithm large "food-in-the-wild" benchmark, as well challenging dataset restaurant dishes with...

10.1145/2964284.2967205 article EN Proceedings of the 30th ACM International Conference on Multimedia 2016-09-29

Automatic Curation of Sports Highlights Using Multimodal Excitement Features

OPENALEX - Publications

Michele Merler Khoi-Nguyen C. Mac Dhiraj Joshi Quoc-Bao Nguyen Stephen Hammer and 5 more

The production of sports highlight packages summarizing a game's most exciting moments is an essential task for broadcast media. Yet, it requires labor-intensive video editing. We propose novel approach auto-curating highlights, and demonstrate to create first kind, real-world system the editorial aid golf tennis reels. Our method fuses information from players' reactions (action recognition such as high-fives fist pumps), expressions (aggressive, tense, smiling, neutral), spectators (crowd...

10.1109/tmm.2018.2876046 article EN IEEE Transactions on Multimedia 2018-10-16

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

OPENALEX - Publications

Mayank Mishra Matt Stallone Gaoyuan Zhang Yikang Shen Aditya Prasad and 41 more

Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, LLMs being integrated into environments to improve productivity of human programmers, and LLM-based agents beginning show promise for handling complex tasks autonomously. Realizing full potential requires a wide range capabilities, including generation, fixing bugs, explaining documenting code, maintaining repositories, more. In this work, we introduce Granite series decoder-only...

10.48550/arxiv.2405.04324 preprint EN arXiv (Cornell University) 2024-05-07

You are what you tweet…pic! gender prediction based on semantic analysis of social media images

OPENALEX - Publications

Michele Merler Liangliang Cao John R. Smith

We propose a method to extract user attributes from the pictures posted in social media feeds, specifically gender information. While traditional approaches rely on text analysis or exploit visual information only profile picture colors, we look at distribution of semantics coming whole feed person estimate gender. In order compute such semantic distribution, trained models existing taxonomies recognize objects, scenes and activities, applied them images each user's feed. Experiments...

10.1109/icme.2015.7177499 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2015-06-01

Snap, Eat, RepEat

OPENALEX - Publications

Michele Merler Hui Wu Rosario Uceda‐Sosa Quoc-Bao Nguyen John R. Smith

We present a system to assist users in dietary logging habits, which performs food recognition from pictures snapped on their phone two different scenarios. In the first scenario, called "Food context", we exploit GPS information of user determine restaurant they are having meal at, therefore restricting categories recognize set items menu. Such context allows us also report precise calories about meal, since chains tend standardize portions and provide each meal. second "Foods wild" try...

10.1145/2986035.2986036 article EN 2016-10-12

Diversity in Faces

OPENALEX - Publications

Michele Merler Nalini Ratha Rogério Feris John R. Smith

Face recognition is a long standing challenge in the field of Artificial Intelligence (AI). The goal to create systems that accurately detect, recognize, verify, and understand human faces. There are significant technical hurdles making these accurate, particularly unconstrained settings due confounding factors related pose, resolution, illumination, occlusion, viewpoint. However, with recent advances neural networks, face has achieved unprecedented accuracy, largely built on data-driven...

10.48550/arxiv.1901.10436 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce

OPENALEX - Publications

Rong Yan Marc-Olivier Fleury Michele Merler Apostol Natsev John R. Smith

With the rapid growth of multimedia data, it becomes increasingly important to develop semantic concept modeling approaches that are consistently effective, highly efficient, and easily scalable. To this end, we first propose robust subspace bagging (RB-SBag) algorithm by augmenting random with forward model selection. Compared traditional approaches, RB-SBag offers a considerably faster learning process while minimizing risk overfitting. Its ensemble structure also enables convenient...

10.1145/1631058.1631067 article EN 2009-10-23

A generalized framework for medical image classification and recognition

OPENALEX - Publications

Mohammadali Abedini Noel Codella Jonathan H. Connell Rahil Garnavi Michele Merler and 3 more

In this work, we study the performance of a two-stage ensemble visual machine learning framework for classification medical images. first stage, models are built subsets features and data, in second combined. We demonstrate four contexts: 1) The public ImageCLEF (Cross Language Evaluation Forum) 2013 modality recognition benchmark, 2) echocardiography view mode recognition, 3) dermatology disease across two datasets, 4) broad image dataset, merged from multiple data sources into collection...

10.1147/jrd.2015.2390017 article EN IBM Journal of Research and Development 2015-03-01

Modeling Attributes from Category-Attribute Proportions

OPENALEX - Publications

Felix X. Yu Liangliang Cao Michele Merler Noel Codella Tao Chen and 2 more

Attribute-based representation has been widely used in visual recognition and retrieval due to its interpretability cross-category generalization properties. However, classic attribute learning requires manually labeling attributes on the images, which is very expensive, not scalable. In this paper, we propose model from category-attribute proportions. The proposed framework can without labels images. Specifically, given a multi-class image datasets with N categories, an attribute, based...

10.1145/2647868.2654993 article EN 2014-11-03

Automatic Curation of Golf Highlights Using Multimodal Excitement Features

OPENALEX - Publications

Michele Merler Dhiraj Joshi Quoc-Bao Nguyen Stephen Hammer John Kent and 2 more

The production of sports highlight packages summarizing a game's most exciting moments is an essential task for broadcast media. Yet, it requires labor-intensive video editing. We propose novel approach auto-curating highlights, and use to create real-world system the editorial aid golf reels. Our method fuses information from players' reactions (action recognition such as high-fives fist pumps), spectators (crowd cheering), commentator (tone voice word analysis) determine interesting game....

10.1109/cvprw.2017.14 article EN 2017-07-01

Semantic keyword extraction via adaptive text binarization of unstructured unsourced video

OPENALEX - Publications

Michele Merler John R. Kender

We propose a fully automatic method for summarizing and indexing unstructured presentation videos based on text extracted from the projected slides. use changes of in slides as means to segment video into semantic shots. Unlike precedent approaches, our does not depend availability electronic source slides, but rather extracts recognizes directly video. Once regions are detected within keyframes, novel binarization algorithm, Local Adaptive Otsu (LOA), is employed deal with low quality scene...

10.1109/icip.2009.5413432 article EN 2009-11-01

Heterogeneous Semantic Level Features Fusion for Action Recognition

OPENALEX - Publications

Junjie Cai Michele Merler Sharath Pankanti Qi Tian

Action recognition is an important problem in computer vision and has received substantial attention recent years. However, it remains very challenging due to the complex interaction of static dynamic information, as well high computational cost processing video data. This paper aims apply success image semantic domain, by leveraging both motion based descriptors different stages ladder. We examine effects three types features: low-level descriptors, intermediate-level deep architecture...

10.1145/2671188.2749320 article EN 2015-06-22

Automated Medical Image Modality Recognition by Fusion of Visual and Text Information

OPENALEX - Publications

Noel Codella Jonathan H. Connell Sharath Pankanti Michele Merler John R. Smith

10.1007/978-3-319-10470-6_61 article EN Lecture notes in computer science 2014-01-01

Leveraging multiple cues for recognizing family photos

OPENALEX - Publications

Xiaolong Wang Guodong Guo Michele Merler Noel Codella Rohith MV and 2 more

10.1016/j.imavis.2016.07.006 article EN Image and Vision Computing 2016-07-26

Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code

OPENALEX - Publications

Rangeet Pan Ali Reza Ibrahimzada Rahul Krishna Divya Sankar Lambert Pouguem Wassi and 5 more

10.48550/arxiv.2308.03109 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

Imbalanced RankBoost for efficiently ranking large-scale image/video collections

OPENALEX - Publications

Michele Merler Rong Yan John R. Smith

Ranking large scale image and video collections usually expects higher accuracy on top ranked data, while tolerates lower bottom ones. In view of this, we propose a rank learning algorithm, called Imbalanced RankBoost, which merges RankBoost iterative thresholding into unified loss optimization framework. The proposed approach provides more efficient ranking process by iteratively identifying cutoff threshold in each boosting iteration, automatically truncating feature computation for the...

10.1109/cvpr.2009.5206575 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2009-06-01

Coming Soon ...