- Natural Language Processing Techniques
- Topic Modeling
- Data Quality and Management
- Advanced Graph Neural Networks
- Advanced Combinatorial Mathematics
- Random Matrices and Applications
- Speech Recognition and Synthesis
- Multimodal Machine Learning Applications
- Advanced Algebra and Geometry
- Language Development and Disorders
- Matrix Theory and Algorithms
- graph theory and CDMA systems
- Mathematics and Applications
- Music and Audio Processing
- Language and cultural evolution
- Advanced Topics in Algebra
- Neurobiology of Language and Bilingualism
- Machine Learning and Algorithms
- Semantic Web and Ontologies
Amazon (United Kingdom)
2021-2023
Indian Statistical Institute
2020-2022
Amazon (United States)
2021
While models have reached superhuman performance on popular question answering (QA) datasets such as SQuAD, they yet to outperform humans the task of itself. In this paper, we investigate if are learning reading comprehension from QA by evaluating BERT-based across five datasets. We evaluate their generalizability out-of-domain examples, responses missing or incorrect data, and ability handle variations. find that no single dataset is robust all our experiments identify shortcomings in both...
We introduce Mintaka, a complex, natural, and multilingual dataset designed for experimenting with end-to-end question-answering models. Mintaka is composed of 20,000 question-answer pairs collected in English, annotated Wikidata entities, translated into Arabic, French, German, Hindi, Italian, Japanese, Portuguese, Spanish total 180,000 samples. includes 8 types complex questions, including superlative, intersection, multi-hop which were naturally elicited from crowd workers. run baselines...
Large language models have shown impressive abilities to reason over input text, however, they are prone hallucinations. On the other hand, end-to-end knowledge graph question answering (KGQA) output responses grounded in facts, but still struggle with complex reasoning, such as comparison or ordinal questions. In this paper, we propose a new method for where combine retriever based on an KGQA model that reasons retrieved facts return answer. We observe augmenting prompts KG improves...
Recently, end-to-end (E2E) trained models for question answering over knowledge graphs (KGQA) have delivered promising results using only a weakly supervised dataset. However, these are and evaluated in setting where hand-annotated entities supplied to the model, leaving important non-trivial task of entity resolution (ER) outside scope E2E learning. In this work, we extend boundaries learning KGQA include training an ER component. Our model needs text answer train, delivers stand-alone QA...
End-to-end question answering using a differentiable knowledge graph is promising technique that requires only weak supervision, produces interpretable results, and fully differentiable. Previous implementations of this (Cohen et al, 2020) have focused on single-entity questions relation following operation. In paper, we propose model explicitly handles multiple-entity by implementing new intersection operation, which identifies the shared elements between two sets entities. We find...
The context window of large language models (LLMs) has been extended significantly in recent years. However, while the length that LLM can process grown, capability model to accurately reason over degrades noticeably. This occurs because modern LLMs often become overwhelmed by vast amount information context; when answering questions, must identify and relevant evidence sparsely distributed throughout text. To alleviate challenge long-context reasoning, we develop a retrieve-then-reason...
Collecting training data for semantic parsing is a time-consuming and expensive task. As result, there growing interest in industry to reduce the number of annotations required train parser, both cut down on costs limit customer handled by annotators. In this paper, we propose uncertainty traffic-aware active learning, novel learning method that uses model confidence utterance frequencies from traffic select utterances annotation. We show our significantly outperforms baselines an internal...
Patterned random matrices such as the reverse circulant, symmetric Toeplitz and Hankel their almost sure limiting spectral distribution (LSD), have attracted much attention. Under assumption that entries are taken from an i.i.d. sequence with finite variance, LSD tied together by a common thread -- $2k$th moment of limit equals weighted sum over different types pair-partitions set $\{1, 2, \ldots, 2k\}$ universal. Some results also known for sparse case. In this paper we generalise these...
The scaled standard Wigner matrix (symmetric with mean zero, variance one i.i.d. entries), and its limiting eigenvalue distribution, namely the semi-circular have attracted much attention. [Formula: see text]th moment of limit equals number non-crossing pair-partitions set text]. There are several extensions this result in literature. In paper, we consider a unifying extension which also yields additional results. Suppose text] is an symmetric where entries independently distributed. We show...
Speech disfluencies are prevalent in spontaneous speech. The rising popularity of voice assistants presents a growing need to handle naturally occurring disfluencies. Semantic parsing is key component for understanding user utterances assistants, yet most semantic research date focuses on written text. In this paper, we investigate disfluent speech with the ATIS dataset. We find that state-of-the-art parser does not seamlessly experiment adding real and synthetic at training time only...
End-to-end question answering using a differentiable knowledge graph is promising technique that requires only weak supervision, produces interpretable results, and fully differentiable. Previous implementations of this (Cohen et al., 2020) have focused on single-entity questions relation following operation. In paper, we propose model explicitly handles multiple-entity by implementing new intersection operation, which identifies the shared elements between two sets entities. We find...
Recently, end-to-end (E2E) trained models for question answering over knowledge graphs (KGQA) have delivered promising results using only a weakly supervised dataset. However, these are and evaluated in setting where hand-annotated entities supplied to the model, leaving important non-trivial task of entity resolution (ER) outside scope E2E learning. In this work, we extend boundaries learning KGQA include training an ER component. Our model needs text answer train, delivers stand-alone QA...