- Advanced Image and Video Retrieval Techniques
- Neural Networks and Applications
- Remote-Sensing Image Classification
- Video Analysis and Summarization
- Anomaly Detection Techniques and Applications
- Image Retrieval and Classification Techniques
- Domain Adaptation and Few-Shot Learning
- Multimodal Machine Learning Applications
- Music and Audio Processing
- Machine Learning and Data Classification
- Stock Market Forecasting Methods
- Speech and Audio Processing
- Speech Recognition and Synthesis
- Atmospheric and Environmental Gas Dynamics
- Imbalanced Data Classification Techniques
- Advanced Neural Network Applications
- Adversarial Robustness in Machine Learning
- Air Quality Monitoring and Forecasting
- Automated Road and Building Extraction
- Topic Modeling
- Remote Sensing and Land Use
- Natural Language Processing Techniques
- Computational Physics and Python Applications
- Data Stream Mining Techniques
- Generative Adversarial Networks and Image Synthesis
University of St. Gallen
2017-2024
Institute of Computer Science
2020-2021
Czech Academy of Sciences, Institute of Computer Science
2020
German Research Centre for Artificial Intelligence
2008-2018
University of Kaiserslautern
2008-2018
International Computer Science Institute
2014-2016
We present the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M), largest public multimedia collection that has ever been released. The dataset contains a total of million media objects, which approximately 99.2 are photos and 0.8 videos, all carry license. Each object in is represented by several pieces metadata, e.g. identifier, owner name, camera, title, tags, geo, source. provides comprehensive snapshot how videos were taken, described, shared over years, from inception 2004...
In this paper, we present a patch-based land use and cover classification approach using Sentinel-2 satellite images. The images are openly freely accessible, provided in the earth observation program Copernicus. We novel dataset, based on these that covers 13 spectral bands is comprised of ten classes with total 27 000 labeled geo-referenced Benchmarks for dataset its state-of-the-art deep convolutional neural networks. An overall accuracy 98.57% was achieved proposed dataset. resulting...
We address the challenge of sentiment analysis from visual content. In contrast to existing methods which infer or emotion directly low-level features, we propose a novel approach based on understanding concepts that are strongly related sentiments. Our key contribution is two-fold: first, present method built upon psychological theories and web mining automatically construct large-scale Visual Sentiment Ontology (VSO) consisting more than 3,000 Adjective Noun Pairs (ANP). Second, SentiBank,...
This paper introduces a visual sentiment concept classification method based on deep convolutional neural networks (CNNs). The concepts are adjective noun pairs (ANPs) automatically discovered from the tags of web photos, and can be utilized as effective statistical cues for detecting emotions depicted in images. Nearly one million Flickr images tagged with these ANPs downloaded to train classifiers concepts. We adopt popular model which recently shows great performance improvement...
The increased availability of high-resolution satellite imagery allows to sense very detailed structures on the surface our planet. Access such information opens up new directions in analysis remote sensing imagery. While deep neural networks have achieved significant advances semantic segmentation images, most existing approaches tend produce predictions with poor boundaries. In this paper, we address problem preserving boundaries by introducing a novel multi-task loss. loss leverages...
In this paper, we address the challenge of land use and cover classification using Sentinel-2 satellite images. The key contributions are as follows. We present a novel dataset based on images covering 13 different spectral bands consisting 10 classes with in total 27,000 labeled evaluate state-of-the-art deep Convolutional Neural Networks (CNNs) its bands. also CNNs existing remote sensing datasets compare obtained results. With proposed dataset, achieved an overall accuracy 98.57%. system...
A picture is worth one thousand words, but what words should be used to describe the sentiment and emotions conveyed in increasingly popular social multimedia? We demonstrate a novel system which combines sound structures from psychology folksonomy extracted multimedia develop large visual ontology consisting of 1,200 concepts associated classifiers called SentiBank. Each concept, defined as an Adjective Noun Pair (ANP), made adjective strongly indicating noun corresponding objects or scenes...
Transformer models have recently approached or even surpassed the performance of ConvNets on computer vision tasks like classification and segmentation. To a large degree, these successes been enabled by use large-scale labelled image datasets for supervised pre-training. This poses significant challenge adaption Transformers to domains where with millions samples are not available. In this work, we bridge gap between Earth observation self-supervised pre-training unlabelled remote sensing...
Transformer architectures have become state-of-the-art models in computer vision and natural language processing. To a significant degree, their success can be attributed to self-supervised pre-training on large scale unlabeled datasets. This work investigates the use of masked image reconstruction advance transformer for hyperspectral remote sensing imagery. facilitate pre-training, we build dataset observations from EnMAP satellite systematically investigate modifications architecture...
The development of foundation models has revolutionized our ability to interpret the Earth's surface using satellite observational data. Traditional have been siloed, tailored specific sensors or data types like optical, radar, and hyperspectral, each with its own unique characteristics. This specialization hinders potential for a holistic analysis that could benefit from combined strengths these diverse sources. Our novel approach introduces Dynamic One-For-All (DOFA) model, leveraging...
Learning to detect fraud in large-scale accounting data is one of the long-standing challenges financial statement audits or investigations. Nowadays, majority applied techniques refer handcrafted rules derived from known scenarios. While fairly successful, these exhibit drawback that they often fail generalize beyond scenarios and fraudsters gradually find ways circumvent them. To overcome this disadvantage inspired by recent success deep learning we propose application autoencoder neural...
In this paper, we address the challenge of land use and cover classification using Sentinel-2 satellite images. The images are openly freely accessible provided in Earth observation program Copernicus. We present a novel dataset based on covering 13 spectral bands consisting out 10 classes with total 27,000 labeled geo-referenced provide benchmarks for its state-of-the-art deep Convolutional Neural Network (CNNs). With proposed dataset, achieved an overall accuracy 98.57%. resulting system...
Among the vast information available on web, social media streams capture what people currently pay attention to and how they feel about certain topics. Awareness of such trending topics plays a crucial role in multimedia systems as trend aware recommendation automatic vocabulary selection for video concept detection systems. Correctly utilizing requires better understanding their various characteristics different streams. To this end, we present first comprehensive study across three major...
The availability of satellite images for academic or commercial purpose is increasing rapidly due to efforts made by governmental agencies (NASA, ESA) publish such data openly startups (PlanetLabs) provide real-time data. Beyond many application, helpful create situation awareness in disaster recovery and emergency situations as wildfires, earthquakes, flooding. To fully utilize sources, we present a scalable system the contextual enrichment crawling analyzing multimedia content from social...
With the Yahoo Flickr Creative Commons 100 Million (YFCC100m) dataset, a novel dataset was introduced to computer vision and multimedia research community. To maximize benefit for community utilize its potential, this has be made accessible by tools allowing search target concepts within mechanism browse images videos of dataset. Following best practice from data collections, such as ImageNet MS COCO, paper presents means accessibility YFCC100m This includes global analysis an online browser...
The increased availability of high resolution satellite imagery allows to sense very detailed structures on the surface our planet. Access such information opens up new directions in analysis remote sensing imagery. However, at same time this raises a set challenges for existing pixel-based prediction methods, as semantic segmentation approaches. While deep neural networks have achieved significant advances images past, most approaches tend produce predictions with poor boundaries. In paper,...
We propose a novel way to measure and understand convolutional neural networks by quantifying the amount of input signal they let in. To do this, an autoencoder (AE) was fine-tuned on gradients from pre-trained classifier with fixed parameters. compared reconstructed samples AEs that were set image classifiers (AlexNet, VGG16, ResNet-50, Inception v3) found substantial differences. The AE learns which aspects space preserve ones ignore, based information encoded in backpropagated gradients....
Abstract. Self-supervised learning has great potential for the remote sensing domain, where unlabelled observations are abundant, but labels hard to obtain. This work leverages multi-modal data augmentation-free contrastive self-supervised learning. Deep neural network models trained maximize similarity of latent representations obtained with different techniques from same location, while distinguishing them other locations. We showcase this idea two fusion methods and compare against...
Air pollution is a central environmental problem in countries around the world. It contributes to climate change through emission of greenhouse gases, and adversely impacts health billions people. Despite its importance, detailed information about spatial temporal distribution pollutants complex obtain. Ground-level monitoring stations are sparse, approaches for modeling air rely on extensive datasets which unavailable many locations. We introduce three techniques estimation overcome these...
The Placing Task is a yearly challenge offered by the MediaEval Multimedia Benchmarking Initiative that requires participants to develop algorithms automatically predict geo-location of social media videos and images. We introduce recent development new standardized web-scale geo-tagged dataset for 2014, which contains 5.5 million photos 35,000 videos. This benchmark with large persistent allows research community easily evaluate analyze their performance respect state-of-the-art approaches....
We present a work-in-progress snapshot of learning with 15 billion parameter deep network on HPC architectures applied to the largest publicly available natural image and video dataset released to-date. Recent advancements in unsupervised neural networks suggest that scaling up such both model training size can yield significant improvements concepts at highest layers. train our three-layer Yahoo! Flickr Creative Commons 100M dataset. The comprises approximately 99.2 million images 800,000...
The sharing of microdata, such as fund holdings and derivative instruments, by regulatory institutions presents a unique challenge due to strict data confidentiality privacy regulations. These challenges often hinder the ability both academics practitioners conduct collaborative research effectively. emergence generative models, particularly diffusion capable synthesizing mimicking underlying distributions real-world compelling solution. This work introduces Financial Tabular Diffusion...