NFDI4DS | UHH-SEMS - Publication Details

Martin Jaggi

ORCID: 0000-0003-1579-5558

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5073756389

Research Areas

Stochastic Gradient Optimization Techniques
Sparse and Compressive Sensing Techniques
Privacy-Preserving Technologies in Data
Topic Modeling
Natural Language Processing Techniques
Advanced Neural Network Applications
Adversarial Robustness in Machine Learning
Domain Adaptation and Few-Shot Learning
Advanced Optimization Algorithms Research
Machine Learning and Algorithms
Multimodal Machine Learning Applications
Age of Information Optimization
Advanced Bandit Algorithms Research
Advanced Text Analysis Techniques
Neural Networks and Applications
Machine Learning and ELM
Generative Adversarial Networks and Image Synthesis
Sentiment Analysis and Opinion Mining
Machine Learning in Healthcare
Explainable Artificial Intelligence (XAI)
Text Readability and Simplification
Machine Learning and Data Classification
Distributed Control Multi-Agent Systems
Model Reduction and Neural Networks
Statistical Methods and Inference

École Polytechnique Fédérale de Lausanne
2017-2024

University Hospital of Bern
2024

University of Michigan
2024

Yale University
2024

University of Tübingen
2023

ETH Zurich
2009-2019

Novartis (Switzerland)
2019

Novartis Institutes for BioMedical Research
2019

University of California, Berkeley
2019

École Polytechnique
2012-2018

Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features

OPENALEX - Publications

Matteo Pagliardini Prakhar Gupta Martin Jaggi

Matteo Pagliardini, Prakhar Gupta, Martin Jaggi. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.

10.18653/v1/n18-1049 preprint EN cc-by 2018-01-01

Ensemble Distillation for Robust Model Fusion in Federated Learning

OPENALEX - Publications

Tao Lin Lingjing Kong Sebastian U. Stich Martin Jaggi

Federated Learning (FL) is a machine learning setting where many devices collaboratively train model while keeping the training data decentralized. In most of current schemes central refined by averaging parameters server and updated from client side. However, directly only possible if all models have same structure size, which could be restrictive constraint in scenarios. this work we investigate more powerful flexible aggregation for FL. Specifically, propose ensemble distillation fusion,...

10.48550/arxiv.2006.07242 preprint EN cc-by arXiv (Cornell University) 2020-01-01

On the Relationship between Self-Attention and Convolutional Layers

OPENALEX - Publications

Jean-Baptiste Cordonnier Andreas Loukas Martin Jaggi

Recent trends of incorporating attention mechanisms in vision have led researchers to reconsider the supremacy convolutional layers as a primary building block. Beyond helping CNNs handle long-range dependencies, Ramachandran et al. (2019) showed that can completely replace convolution and achieve state-of-the-art performance on tasks. This raises question: do learned operate similarly layers? work provides evidence perform and, indeed, they often learn so practice. Specifically, we prove...

10.48550/arxiv.1911.03584 preprint EN cc-by-nc-sa arXiv (Cornell University) 2019-01-01

Simple Unsupervised Keyphrase Extraction using Sentence Embeddings

OPENALEX - Publications

Kamil Bennani-Smires Claudiu Musat Theus Hossmann Michael Baeriswyl Martin Jaggi

Keyphrase extraction is the task of automatically selecting a small set phrases that best describe given free text document. Supervised keyphrase requires large amounts labeled training data and generalizes very poorly outside domain data. At same time, unsupervised systems have poor accuracy, often do not generalize well, as they require input document to belong larger corpus also input. Addressing these drawbacks, in this paper, we tackle from single documents with EmbedRank: novel method,...

10.18653/v1/k18-1022 article EN cc-by 2018-01-01

Distributed optimization with arbitrary local solvers

OPENALEX - Publications

Chenxin Ma Jakub Konečný Martin Jaggi Virginia Smith Michael I. Jordan and 2 more

With the growth of data and necessity for distributed optimization methods, solvers that work well on a single machine must be re-designed to leverage computation. Recent in this area has been limited by focusing heavily developing highly specific methods environment. These special-purpose are often unable fully competitive performance their well-tuned customized counterparts. Further, they easily integrate improvements continue made methods. To end, we present framework both allows...

10.1080/10556788.2016.1278445 article EN cc-by Optimization methods & software 2017-02-01

Unsupervised Scalable Representation Learning for Multivariate Time Series

OPENALEX - Publications

Jean-Yves Franceschi Aymeric Dieuleveut Martin Jaggi

Time series constitute a challenging data type for machine learning algorithms, due to their highly variable lengths and sparse labeling in practice. In this paper, we tackle challenge by proposing an unsupervised method learn universal embeddings of time series. Unlike previous works, it is scalable with respect length demonstrate the quality, transferability practicability learned representations thorough experiments comparisons. To end, combine encoder based on causal dilated convolutions...

10.48550/arxiv.1901.10738 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Evaluating the Search Phase of Neural Architecture Search

OPENALEX - Publications

Kaicheng Yu Christian Sciuto Martin Jaggi Claudiu Musat Mathieu Salzmann

Neural Architecture Search (NAS) aims to facilitate the design of deep networks for new tasks. Existing techniques rely on two stages: searching over architecture space and validating best architecture. NAS algorithms are currently compared solely based their results downstream task. While intuitive, this fails explicitly evaluate effectiveness search strategies. In paper, we propose phase. To end, compare quality solutions obtained by policies with that random selection. We find that: (i)...

10.48550/arxiv.1902.08142 preprint EN cc-by-nc-sa arXiv (Cornell University) 2019-01-01

Don't Use Large Mini-Batches, Use Local SGD

OPENALEX - Publications

Tao Lin Sebastian U. Stich Kumar Kshitij Patel Martin Jaggi

Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training deep neural networks. Drastic increases in mini-batch sizes have lead to key efficiency and scalability gains recent years. However, progress faces a major roadblock, as models trained with large batches often do not generalize well, i.e. they show good accuracy on new data. As remedy, we propose \emph{post-local} SGD that it significantly improves generalization performance compared large-batch...

10.48550/arxiv.1808.07217 preprint EN cc-by arXiv (Cornell University) 2018-01-01

Error Feedback Fixes SignSGD and other Gradient Compression Schemes

OPENALEX - Publications

Sai Praneeth Karimireddy Quentin Rebjock Sebastian U. Stich Martin Jaggi

Sign-based algorithms (e.g. signSGD) have been proposed as a biased gradient compression technique to alleviate the communication bottleneck in training large neural networks across multiple workers. We show simple convex counter-examples where signSGD does not converge optimum. Further, even when it converge, may generalize poorly compared with SGD. These issues arise because of nature sign operator. then that using error-feedback, i.e. incorporating error made by operator into next step,...

10.48550/arxiv.1901.09847 preprint EN other-oa arXiv (Cornell University) 2019-01-01

A Field Guide to Federated Optimization

OPENALEX - Publications

Jianyu Wang Zachary Charles Zheng Xu Gauri Joshi H. Brendan McMahan and 48 more

Federated learning and analytics are a distributed approach for collaboratively models (or statistics) from decentralized data, motivated by designed privacy protection. The process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with system requirements, other constraints that not primary considerations in problem settings. This paper provides recommendations guidelines on formulating, designing,...

10.48550/arxiv.2107.06917 preprint EN other-oa arXiv (Cornell University) 2021-01-01

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

OPENALEX - Publications

Zeming Chen A. Cano Angelika Romanou Antoine Bonnet Kyle Matoba and 15 more

Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made harness and improve LLMs' knowledge reasoning capacities, the resulting are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we large-scale LLMs by releasing MEDITRON: a suite of open-source with 7B 70B parameters adapted domain. MEDITRON builds on Llama-2 (through our adaptation Nvidia's...

10.48550/arxiv.2311.16079 preprint EN cc-by arXiv (Cornell University) 2023-01-01

On the Global Linear Convergence of Frank-Wolfe Optimization Variants

OPENALEX - Publications

Simon Lacoste-Julien Martin Jaggi

The Frank-Wolfe (FW) optimization algorithm has lately re-gained popularity thanks in particular to its ability nicely handle the structured constraints appearing machine learning applications. However, convergence rate is known be slow (sublinear) when solution lies at boundary. A simple less-known fix add possibility take 'away steps' during optimization, an operation that importantly does not require a feasibility oracle. In this paper, we highlight and clarify several variants of have...

10.48550/arxiv.1511.05932 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Learning Aerial Image Segmentation From Online Maps

OPENALEX - Publications

Pascal Kaiser Jan Dirk Wegner Aurélien Lucchi Martin Jaggi Thomas Hofmann and 1 more

This study deals with semantic segmentation of high-resolution (aerial) images where a class label is assigned to each pixel via supervised classification as basis for automatic map generation. Recently, deep convolutional neural networks (CNNs) have shown impressive performance and quickly become the de-facto standard segmentation, added benefit that task-specific feature design no longer necessary. However, major downside learning methods they are extremely data-hungry, thus aggravating...

10.1109/tgrs.2017.2719738 article EN IEEE Transactions on Geoscience and Remote Sensing 2017-07-21

Block-Coordinate Frank-Wolfe Optimization for Structural SVMs

OPENALEX - Publications

Simon Lacoste-Julien Martin Jaggi Mark Schmidt Patrick Pletscher

We propose a randomized block-coordinate variant of the classic Frank-Wolfe algorithm for convex optimization with block-separable constraints. Despite its lower iteration cost, we show that it achieves similar convergence rate in duality gap as full algorithm. also that, when applied to dual structural support vector machine (SVM) objective, this yields an online has same low complexity primal stochastic subgradient methods. However, unlike methods, allows us compute optimal step-size and...

10.48550/arxiv.1207.4747 preprint EN other-oa arXiv (Cornell University) 2012-01-01

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication

OPENALEX - Publications

Anastasia Koloskova Sebastian U. Stich Martin Jaggi

We consider decentralized stochastic optimization with the objective function (e.g. data samples for machine learning task) being distributed over $n$ machines that can only communicate to their neighbors on a fixed communication graph. To reduce bottleneck, nodes compress quantize or sparsify) model updates. cover both unbiased and biased compression operators quality denoted by $\omega \leq 1$ ($\omega=1$ meaning no compression). (i) propose novel gossip-based gradient descent algorithm,...

10.48550/arxiv.1902.00340 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Coming Soon ...