- Topic Modeling
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Blind Source Separation Techniques
- Human Pose and Action Recognition
- Text Readability and Simplification
- Adversarial Robustness in Machine Learning
- Neural Networks and Applications
- Domain Adaptation and Few-Shot Learning
- Sparse and Compressive Sensing Techniques
- Image and Signal Denoising Methods
- Explainable Artificial Intelligence (XAI)
- Advanced Image and Video Retrieval Techniques
- EEG and Brain-Computer Interfaces
- Advanced Adaptive Filtering Techniques
- Neural Networks and Reservoir Computing
- Inertial Sensor and Navigation
- Microwave Imaging and Scattering Analysis
- Speech and dialogue systems
- Anomaly Detection Techniques and Applications
- GNSS positioning and interference
- Software Engineering Research
- Speech and Audio Processing
- Currency Recognition and Detection
- Video Analysis and Summarization
Microsoft Research (United Kingdom)
2018-2023
Microsoft (United States)
2017-2023
University of North Texas
2023
Massachusetts Institute of Technology
2022
Carnegie Mellon University
2022
Allen Institute
2022
Allen Institute for Artificial Intelligence
2022
University of Washington
2022
University of British Columbia
2013-2016
Sharif University of Technology
2009-2012
Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains tasks, challenging our understanding learning cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale compute data. In this paper, we report on investigation early version when it still in active development OpenAI. We contend (this of) GPT-4 is part new cohort LLMs (along with ChatGPT...
This paper presents a unified Vision-Language Pre-training (VLP) model. The model is in that (1) it can be fine-tuned for either vision-language generation (e.g., image captioning) or understanding visual question answering) tasks, and (2) uses shared multi-layer transformer network both encoding decoding, which differs from many existing methods where the encoder decoder are implemented using separate models. VLP pre-trained on large amount of image-text pairs unsupervised learning...
This paper develops a model that addresses sentence embedding, hot topic in current natural language processing research, using recurrent neural networks with Long Short-Term Memory (LSTM) cells. Due to its ability capture long term memory, the LSTM-RNN accumulates increasingly richer information as it goes through sentence, and when reaches last word, hidden layer of network provides semantic representation whole sentence. In this paper, is trained weakly supervised manner on user...
Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, Ece Kamar. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.
Recent research has focused on enhancing the capability of smaller models through imitation learning, drawing outputs generated by large foundation (LFMs). A number issues impact quality these models, ranging from limited signals shallow LFM outputs; small scale homogeneous training data; and most notably a lack rigorous evaluation resulting in overestimating model's as they tend to learn imitate style, but not reasoning process LFMs. To address challenges, we develop Orca (We are working...
Various studies that address the compressed sensing problem with Multiple Measurement Vectors (MMVs) have been recently carried.These assume vectors of different channels to be jointly sparse.In this paper, we relax condition.Instead these sparse depend on each other but dependency is unknown.We capture by computing conditional probability entry in vector being non-zero, given "residuals" all previous vectors.To estimate probabilities, propose use Long Short-Term Memory (LSTM) [1], a data...
Orca 1 learns from rich signals, such as explanation traces, allowing it to outperform conventional instruction-tuned models on benchmarks like BigBench Hard and AGIEval. In 2, we continue exploring how improved training signals can enhance smaller LMs' reasoning abilities. Research small LMs has often relied imitation learning replicate the output of more capable models. We contend that excessive emphasis may restrict potential seek teach employ different solution strategies for tasks,...
Epilepsy is the second most common brain disorder after migraine. Automatic detection of epileptic seizures can considerably improve patients' quality life. Current Electroencephalogram (EEG)-based seizure systems encounter many challenges in real-life situations. The EEGs are non-stationary signals and patterns vary across patients recording sessions. Moreover, EEG data prone to numerous noise types that negatively affect accuracy seizures. To address these challenges, we introduce use a...
In this paper we address the following problem in web document and information retrieval (IR): How can use long-term context to gain better IR performance? Unlike common methods that bag of words representation for queries documents, treat them as a sequence long short term memory (LSTM) capture contextual dependencies. To best our knowledge, is first time LSTM applied tasks. training traditional LSTMs, strategy different due special nature problem. Experimental evaluation on an task derived...
We have seen a great progress in video action recognition recent years. There are several models based on convolutional neural network (CNN) and some transformer approaches which provide top performance existing benchmarks. In this work, we perform large-scale robustness analysis of these for recognition. focus against real-world distribution shift perturbations instead adversarial perturbations. propose four different benchmark datasets, HMDB51-P, UCF101-P, Kinetics400-P, SSv2-P to...
Recently an influx of studies claim emergent cognitive abilities in large language models (LLMs). Yet, most rely on anecdotes, overlook contamination training sets, or lack systematic Evaluation involving multiple tasks, control conditions, iterations, and statistical robustness tests. Here we make two major contributions. First, propose CogEval, a science-inspired protocol for the evaluation capacities Large Language Models. The CogEval can be followed various abilities. Second, here follow...
We propose Heterogeneous Swarms, an algorithm to design multi-LLM systems by jointly optimizing model roles and weights. represent as directed acyclic graphs (DAGs) of LLMs with topological message passing for collaborative generation. Given a pool LLM experts utility function, Swarms employs two iterative steps: role-step weight-step. For role-step, we interpret learning DAG that specifies the flow inputs outputs between LLMs. Starting from swarm random continuous adjacency matrices, decode...
We introduce an architecture, the Tensor Product RecurrentNetwork (TPRN). In our application of TPRN, internal representations—learned by end-to-end optimization in a deep neural network performing textual question-answering(QA) task—can be interpreted using basic concepts from linguistic theory. No performance penalty need paid for this increased interpretability: proposed model performs comparably to state-of-the-art system on SQuAD QA task.The representation which is Representation: each...
Grounding language to visual relations is critical various language-and-vision applications. In this work, we tackle two fundamental tasks: image-text matching and image captioning, demonstrate that neural scene graph generators can learn effective relation features facilitate grounding subsequently improve the end By combining with state-of-the-art models, our experiments show significant improvement on standard Flickr30K MSCOCO benchmarks. Our experimental results analysis downstream...
Spatial understanding is a fundamental aspect of computer vision and integral for human-level reasoning about images, making it an important component grounded language understanding. While recent text-to-image synthesis (T2I) models have shown unprecedented improvements in photorealism, unclear whether they reliable spatial capabilities. We investigate the ability T2I to generate correct relationships among objects present VISOR, evaluation metric that captures how accurately relationship...
Robust detection of epileptic seizures in the presence inevitable artifacts Electroencephalogram (EEG) signals is addressed. The EEG dataset considered contains 300 recorded from 15 volunteers. Current seizure systems achieve good performance when data entirely free noise. However, their drastically decays with authentic polluted by real artifacts. We introduce a robust method that can address clean and noisy data. proposed uses Long Short-Term Memory (LSTM) neural networks to extract...
A longstanding question in cognitive science concerns the learning mechanisms underlying compositionality human cognition. Humans can infer structured relationships (e.g., grammatical rules) implicit their sensory observations auditory speech), and use this knowledge to guide composition of simpler meanings into complex wholes. Recent progress artificial neural networks has shown that when large models are trained on enough linguistic data, structure emerges representations. We extend work...
Deployed language models decay over time due to shifting inputs, changing user needs, or emergent world-knowledge gaps. When such problems are identified, we want make targeted edits while avoiding expensive retraining. However, current model editors, which modify behaviors of pre-trained models, degrade performance quickly across multiple, sequential edits. We propose GRACE, a lifelong editing method, implements spot-fixes on streaming errors deployed model, ensuring minimal impact...
Data-driven predictive solutions predominant in commercial applications tend to suffer from biases and stereotypes, which raises equity concerns. Prediction models may discover, use, or amplify spurious correlations based on gender other protected personal characteristics, thus discriminating against marginalized groups. Mitigating bias has become an important research focus natural language processing (NLP) is area where annotated corpora are available. Data augmentation reduces by adding...
We study the MMV (Multiple Measurement Vectors) compressive sensing setting with a specific sparse structured support. The locations of non-zero rows in matrix are not known. All that is known have probabilities vary from one group to another. propose two novel greedy algorithms for exact recovery this problem. first algorithm models structure using shallow non- linear neural network. input network residual after prediction and output be recovered. second improves by stacking operation form...