- Domain Adaptation and Few-Shot Learning
- Adversarial Robustness in Machine Learning
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Advanced Image and Video Retrieval Techniques
- Natural Language Processing Techniques
- Topic Modeling
- Stochastic Gradient Optimization Techniques
- Digital Media Forensic Detection
- Generative Adversarial Networks and Image Synthesis
- Anomaly Detection Techniques and Applications
- Medical Image Segmentation Techniques
- Face and Expression Recognition
- Digital Rights Management and Security
- Bacillus and Francisella bacterial research
- Machine Learning and Data Classification
- Subtitles and Audiovisual Media
- Physical Unclonable Functions (PUFs) and Hardware Security
- Video Surveillance and Tracking Methods
- Advanced Vision and Imaging
- Reinforcement Learning in Robotics
- Hand Gesture Recognition Systems
- Handwritten Text Recognition Techniques
- Semantic Web and Ontologies
- Advanced Data Compression Techniques
Apple (United Kingdom)
2023-2024
University of Toronto
2015-2021
Sharif University of Technology
2012
We present a new technique for learning visual-semantic embeddings cross-modal retrieval. Inspired by hard negative mining, the use of negatives in structured prediction, and ranking loss functions, we introduce simple change to common functions used multi-modal embeddings. That, combined with fine-tuning augmented data, yields significant gains retrieval performance. showcase our approach, VSE++, on MS-COCO Flickr30K datasets, using ablation studies comparisons existing methods. On approach...
CleverHans is a software library that provides standardized reference implementations of adversarial example construction techniques and training. The may be used to develop more robust machine learning models provide benchmarks models' performance in the setting. Benchmarks constructed without implementation are not comparable each other, because good result indicate model or it merely weak procedure. This technical report structured as follows. Section 1 an overview examples software. 2...
We show that the representation of an image in a deep neural network (DNN) can be manipulated to mimic those other natural images, with only minor, imperceptible perturbations original image. Previous methods for generating adversarial images focused on designed produce erroneous class labels, while we concentrate internal layers DNN representations. In this way our new differs qualitatively from others. While adversary is perceptually similar one image, its appears remarkably different...
Many communication-efficient variants of SGD use gradient quantization schemes. These schemes are often heuristic and fixed over the course training. We empirically observe that statistics gradients deep models change during Motivated by this observation, we introduce two adaptive schemes, ALQ AMQ. In both processors update their compression in parallel efficiently computing sufficient a parametric distribution. improve validation accuracy almost 2% on CIFAR-10 1% ImageNet challenging...
State of the art computer vision models have been shown to be vulnerable small adversarial perturbations input. In other words, most images in data distribution are both correctly classified by model and very close a visually similar misclassified image. Despite substantial research interest, cause phenomenon is still poorly understood remains unsolved. We hypothesize that this counter intuitive behavior naturally occurring result high dimensional geometry manifold. As first step towards...
The landscape of publicly available vision foundation models (VFMs), such as CLIP and Segment Anything Model (SAM), is expanding rapidly. VFMs are endowed with distinct capabilities stemming from their pre-training objectives. For instance, excels in semantic understanding, while SAM specializes spatial understanding for segmentation. In this work, we introduce a simple recipe to efficiently merge into unified model that absorbs expertise. Our method integrates techniques multi-task...
We propose Dataset Reinforcement, a strategy to improve dataset once such that the accuracy of any model architecture trained on reinforced is improved at no additional training cost for users. Reinforcement based data augmentation and knowledge distillation. Our generic designed extensive analysis across CNN- transformer-based models performing large-scale study distillation with state-of-the-art various augmentations. create version ImageNet dataset, called <sup...
The impact of gradient noise on training deep models is widely acknowledged but not well understood. In this context, we study the distribution gradients during training. We introduce a method, Gradient Clustering, to minimize variance average mini-batch with stratified sampling. prove that minimized if elements are sampled from weighted clustering in space. measure common learning benchmarks and observe that, contrary assumptions, increases training, smaller rates coincide higher variance....
We introduce DataComp for Language Models (DCLM), a testbed controlled dataset experiments with the goal of improving language models. As part DCLM, we provide standardized corpus 240T tokens extracted from Common Crawl, effective pretraining recipes based on OpenLM framework, and broad suite 53 downstream evaluations. Participants in DCLM benchmark can experiment data curation strategies such as deduplication, filtering, mixing at model scales ranging 412M to 7B parameters. baseline conduct...
CLIP models perform remarkably well on zero-shot classification and retrieval tasks. But recent studies have shown that learnt representations in are not suited for dense prediction tasks like object detection, semantic segmentation or depth estimation. More recently, multi-stage training methods was introduced to mitigate the weak performance of downstream In this work, we find simply improving quality captions image-text datasets improves CLIP's visual representations, resulting...
Adversarial training is a common approach to improving the robustness of deep neural networks against adversarial examples. In this work, we propose novel regularization as an alternative. To derive regularizer, formulate problem under robust optimization framework and approximate loss function using second-order Taylor series expansion. Our proposed regularizer (SOAR) upper bound based on approximation inner-max in objective. We empirically show that method significantly improves...
Contrastive learning has emerged as a transformative method for effective visual representations through the alignment of image and text embeddings. However, pairwise similarity computation in contrastive loss between pairs poses computational challenges. This paper presents novel weakly supervised pre-training vision models on web-scale image-text data. The proposed reframes data classification task. Consequently, it eliminates need computations loss, achieving remarkable $2.7\times$...
Large Language Models (LLMs) are frequently updated due to data or architecture changes improve their performance. When updating models, developers often focus on increasing overall performance metrics with less emphasis being compatible previous model versions. However, users build a mental of the functionality and capabilities particular machine learning they interacting with. They have adapt every update -- draining task that can lead user dissatisfaction. In practice, fine-tuned...
The pre-training phase of language models often begins with randomly initialized parameters. With the current trends in scaling models, training their large number parameters can be extremely slow and costly. In contrast, small are less expensive to train, but they cannot achieve accuracy models. this paper, we explore an intriguing idea connect these two different regimes: Can develop a method initialize using smaller pre-trained models? Will such initialization bring any benefits terms...