Michele Merler

ORCID: 0000-0002-4358-8671
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • Video Analysis and Summarization
  • Multimodal Machine Learning Applications
  • Image Retrieval and Classification Techniques
  • Domain Adaptation and Few-Shot Learning
  • Human Pose and Action Recognition
  • Advanced Neural Network Applications
  • Music and Audio Processing
  • Sports Analytics and Performance
  • Topic Modeling
  • Face recognition and analysis
  • Machine Learning and Data Classification
  • Biomedical Text Mining and Ontologies
  • Advanced Chemical Sensor Technologies
  • Natural Language Processing Techniques
  • Anomaly Detection Techniques and Applications
  • Adversarial Robustness in Machine Learning
  • Handwritten Text Recognition Techniques
  • Nutritional Studies and Diet
  • Data Quality and Management
  • Face and Expression Recognition
  • Software Engineering Research
  • Names, Identity, and Discrimination Research
  • Mental Health via Writing
  • Authorship Attribution and Profiling

IBM (United States)
2013-2024

IBM Research - Thomas J. Watson Research Center
2016-2017

Columbia University
2009-2013

University of Trento
2007

University of California, San Diego
2006

Code translation aims to convert source code from one programming language (PL) another. Given the promising abilities of large models (LLMs) in synthesis, researchers are exploring their potential automate translation. The prerequisite for advancing state LLM-based is understand promises and limitations over existing techniques. To that end, we present a large-scale empirical study investigate ability general LLMs across pairs different languages, including C, C++, Go, Java, Python. Our...

10.1145/3597503.3639226 article EN cc-by-nc 2024-04-12

The problem of using pictures objects captured under ideal imaging conditions (here referred to as in vitro) recognize natural environments (in situ) is an emerging area interest computer vision and pattern recognition.Examples tasks this vein include assistive systems for the blind object recognition mobile robots; proliferation image databases on web bound lead more examples near future.Despite its importance, there still a need freely available database facilitate study kind...

10.1109/cvpr.2007.383486 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2007-06-01

We propose semantic model vectors, an intermediate level representation, as a basis for modeling and detecting complex events in unconstrained real-world videos, such those from YouTube. The vectors are extracted using set of discriminative classifiers, each being ensemble SVM models trained thousands labeled web images, total 280 generic concepts. Our study reveals that the proposed representation outperforms-and is complementary to-other low-level visual descriptors video event modeling....

10.1109/tmm.2011.2168948 article EN IEEE Transactions on Multimedia 2011-09-28

We propose a visual food recognition framework that integrates the inherent semantic relationships among fine-grained classes. Our method learns semantics-aware features by formulating multi-task loss function on top of convolutional neural network (CNN) architecture. It then refines CNN predictions using random walk based smoothing procedure, which further exploits rich information. evaluate our algorithm large "food-in-the-wild" benchmark, as well challenging dataset restaurant dishes with...

10.1145/2964284.2967205 article EN Proceedings of the 30th ACM International Conference on Multimedia 2016-09-29

The production of sports highlight packages summarizing a game's most exciting moments is an essential task for broadcast media. Yet, it requires labor-intensive video editing. We propose novel approach auto-curating highlights, and demonstrate to create first kind, real-world system the editorial aid golf tennis reels. Our method fuses information from players' reactions (action recognition such as high-fives fist pumps), expressions (aggressive, tense, smiling, neutral), spectators (crowd...

10.1109/tmm.2018.2876046 article EN IEEE Transactions on Multimedia 2018-10-16

Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, LLMs being integrated into environments to improve productivity of human programmers, and LLM-based agents beginning show promise for handling complex tasks autonomously. Realizing full potential requires a wide range capabilities, including generation, fixing bugs, explaining documenting code, maintaining repositories, more. In this work, we introduce Granite series decoder-only...

10.48550/arxiv.2405.04324 preprint EN arXiv (Cornell University) 2024-05-07

We propose a method to extract user attributes from the pictures posted in social media feeds, specifically gender information. While traditional approaches rely on text analysis or exploit visual information only profile picture colors, we look at distribution of semantics coming whole feed person estimate gender. In order compute such semantic distribution, trained models existing taxonomies recognize objects, scenes and activities, applied them images each user's feed. Experiments...

10.1109/icme.2015.7177499 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2015-06-01

We present a system to assist users in dietary logging habits, which performs food recognition from pictures snapped on their phone two different scenarios. In the first scenario, called "Food context", we exploit GPS information of user determine restaurant they are having meal at, therefore restricting categories recognize set items menu. Such context allows us also report precise calories about meal, since chains tend standardize portions and provide each meal. second "Foods wild" try...

10.1145/2986035.2986036 article EN 2016-10-12

Face recognition is a long standing challenge in the field of Artificial Intelligence (AI). The goal to create systems that accurately detect, recognize, verify, and understand human faces. There are significant technical hurdles making these accurate, particularly unconstrained settings due confounding factors related pose, resolution, illumination, occlusion, viewpoint. However, with recent advances neural networks, face has achieved unprecedented accuracy, largely built on data-driven...

10.48550/arxiv.1901.10436 preprint EN other-oa arXiv (Cornell University) 2019-01-01

With the rapid growth of multimedia data, it becomes increasingly important to develop semantic concept modeling approaches that are consistently effective, highly efficient, and easily scalable. To this end, we first propose robust subspace bagging (RB-SBag) algorithm by augmenting random with forward model selection. Compared traditional approaches, RB-SBag offers a considerably faster learning process while minimizing risk overfitting. Its ensemble structure also enables convenient...

10.1145/1631058.1631067 article EN 2009-10-23

In this work, we study the performance of a two-stage ensemble visual machine learning framework for classification medical images. first stage, models are built subsets features and data, in second combined. We demonstrate four contexts: 1) The public ImageCLEF (Cross Language Evaluation Forum) 2013 modality recognition benchmark, 2) echocardiography view mode recognition, 3) dermatology disease across two datasets, 4) broad image dataset, merged from multiple data sources into collection...

10.1147/jrd.2015.2390017 article EN IBM Journal of Research and Development 2015-03-01

Attribute-based representation has been widely used in visual recognition and retrieval due to its interpretability cross-category generalization properties. However, classic attribute learning requires manually labeling attributes on the images, which is very expensive, not scalable. In this paper, we propose model from category-attribute proportions. The proposed framework can without labels images. Specifically, given a multi-class image datasets with N categories, an attribute, based...

10.1145/2647868.2654993 article EN 2014-11-03

The production of sports highlight packages summarizing a game's most exciting moments is an essential task for broadcast media. Yet, it requires labor-intensive video editing. We propose novel approach auto-curating highlights, and use to create real-world system the editorial aid golf reels. Our method fuses information from players' reactions (action recognition such as high-fives fist pumps), spectators (crowd cheering), commentator (tone voice word analysis) determine interesting game....

10.1109/cvprw.2017.14 article EN 2017-07-01

We propose a fully automatic method for summarizing and indexing unstructured presentation videos based on text extracted from the projected slides. use changes of in slides as means to segment video into semantic shots. Unlike precedent approaches, our does not depend availability electronic source slides, but rather extracts recognizes directly video. Once regions are detected within keyframes, novel binarization algorithm, Local Adaptive Otsu (LOA), is employed deal with low quality scene...

10.1109/icip.2009.5413432 article EN 2009-11-01

Action recognition is an important problem in computer vision and has received substantial attention recent years. However, it remains very challenging due to the complex interaction of static dynamic information, as well high computational cost processing video data. This paper aims apply success image semantic domain, by leveraging both motion based descriptors different stages ladder. We examine effects three types features: low-level descriptors, intermediate-level deep architecture...

10.1145/2671188.2749320 article EN 2015-06-22

Code translation aims to convert source code from one programming language (PL) another. Given the promising abilities of large models (LLMs) in synthesis, researchers are exploring their potential automate translation. The prerequisite for advancing state LLM-based is understand promises and limitations over existing techniques. To that end, we present a large-scale empirical study investigate ability general LLMs across pairs different languages, including C, C++, Go, Java, Python. Our...

10.48550/arxiv.2308.03109 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

Ranking large scale image and video collections usually expects higher accuracy on top ranked data, while tolerates lower bottom ones. In view of this, we propose a rank learning algorithm, called Imbalanced RankBoost, which merges RankBoost iterative thresholding into unified loss optimization framework. The proposed approach provides more efficient ranking process by iteratively identifying cutoff threshold in each boosting iteration, automatically truncating feature computation for the...

10.1109/cvpr.2009.5206575 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2009-06-01
Coming Soon ...