Priyanka Sen

ORCID: 0000-0001-6941-180X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Data Quality and Management
  • Advanced Graph Neural Networks
  • Advanced Combinatorial Mathematics
  • Random Matrices and Applications
  • Speech Recognition and Synthesis
  • Multimodal Machine Learning Applications
  • Advanced Algebra and Geometry
  • Language Development and Disorders
  • Matrix Theory and Algorithms
  • graph theory and CDMA systems
  • Mathematics and Applications
  • Music and Audio Processing
  • Language and cultural evolution
  • Advanced Topics in Algebra
  • Neurobiology of Language and Bilingualism
  • Machine Learning and Algorithms
  • Semantic Web and Ontologies

Amazon (United Kingdom)
2021-2023

Indian Statistical Institute
2020-2022

Amazon (United States)
2021

While models have reached superhuman performance on popular question answering (QA) datasets such as SQuAD, they yet to outperform humans the task of itself. In this paper, we investigate if are learning reading comprehension from QA by evaluating BERT-based across five datasets. We evaluate their generalizability out-of-domain examples, responses missing or incorrect data, and ability handle variations. find that no single dataset is robust all our experiments identify shortcomings in both...

10.18653/v1/2020.emnlp-main.190 preprint EN cc-by 2020-01-01

We introduce Mintaka, a complex, natural, and multilingual dataset designed for experimenting with end-to-end question-answering models. Mintaka is composed of 20,000 question-answer pairs collected in English, annotated Wikidata entities, translated into Arabic, French, German, Hindi, Italian, Japanese, Portuguese, Spanish total 180,000 samples. includes 8 types complex questions, including superlative, intersection, multi-hop which were naturally elicited from crowd workers. run baselines...

10.48550/arxiv.2210.01613 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Large language models have shown impressive abilities to reason over input text, however, they are prone hallucinations. On the other hand, end-to-end knowledge graph question answering (KGQA) output responses grounded in facts, but still struggle with complex reasoning, such as comparison or ordinal questions. In this paper, we propose a new method for where combine retriever based on an KGQA model that reasons retrieved facts return answer. We observe augmenting prompts KG improves...

10.18653/v1/2023.nlrse-1.1 article EN cc-by 2023-01-01

Recently, end-to-end (E2E) trained models for question answering over knowledge graphs (KGQA) have delivered promising results using only a weakly supervised dataset. However, these are and evaluated in setting where hand-annotated entities supplied to the model, leaving important non-trivial task of entity resolution (ER) outside scope E2E learning. In this work, we extend boundaries learning KGQA include training an ER component. Our model needs text answer train, delivers stand-alone QA...

10.18653/v1/2021.emnlp-main.345 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

End-to-end question answering using a differentiable knowledge graph is promising technique that requires only weak supervision, produces interpretable results, and fully differentiable. Previous implementations of this (Cohen et al, 2020) have focused on single-entity questions relation following operation. In paper, we propose model explicitly handles multiple-entity by implementing new intersection operation, which identifies the shared elements between two sets entities. We find...

10.18653/v1/2021.emnlp-main.694 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

10.30757/alea.v20-05 article EN Latin American Journal of Probability and Mathematical Statistics 2023-01-01

The context window of large language models (LLMs) has been extended significantly in recent years. However, while the length that LLM can process grown, capability model to accurately reason over degrades noticeably. This occurs because modern LLMs often become overwhelmed by vast amount information context; when answering questions, must identify and relevant evidence sparsely distributed throughout text. To alleviate challenge long-context reasoning, we develop a retrieve-then-reason...

10.48550/arxiv.2410.03227 preprint EN arXiv (Cornell University) 2024-10-04

Collecting training data for semantic parsing is a time-consuming and expensive task. As result, there growing interest in industry to reduce the number of annotations required train parser, both cut down on costs limit customer handled by annotators. In this paper, we propose uncertainty traffic-aware active learning, novel learning method that uses model confidence utterance frequencies from traffic select utterances annotation. We show our significantly outperforms baselines an internal...

10.18653/v1/2020.intexsempar-1.2 article EN cc-by 2020-01-01

Patterned random matrices such as the reverse circulant, symmetric Toeplitz and Hankel their almost sure limiting spectral distribution (LSD), have attracted much attention. Under assumption that entries are taken from an i.i.d. sequence with finite variance, LSD tied together by a common thread -- $2k$th moment of limit equals weighted sum over different types pair-partitions set $\{1, 2, \ldots, 2k\}$ universal. Some results also known for sparse case. In this paper we generalise these...

10.1142/s2010326321500301 article EN Random Matrices Theory and Application 2020-08-29

The scaled standard Wigner matrix (symmetric with mean zero, variance one i.i.d. entries), and its limiting eigenvalue distribution, namely the semi-circular have attracted much attention. [Formula: see text]th moment of limit equals number non-crossing pair-partitions set text]. There are several extensions this result in literature. In paper, we consider a unifying extension which also yields additional results. Suppose text] is an symmetric where entries independently distributed. We show...

10.1142/s2010326322500216 article EN Random Matrices Theory and Application 2022-03-08

Speech disfluencies are prevalent in spontaneous speech. The rising popularity of voice assistants presents a growing need to handle naturally occurring disfluencies. Semantic parsing is key component for understanding user utterances assistants, yet most semantic research date focuses on written text. In this paper, we investigate disfluent speech with the ATIS dataset. We find that state-of-the-art parser does not seamlessly experiment adding real and synthetic at training time only...

10.18653/v1/2021.eacl-main.150 article EN cc-by 2021-01-01

End-to-end question answering using a differentiable knowledge graph is promising technique that requires only weak supervision, produces interpretable results, and fully differentiable. Previous implementations of this (Cohen et al., 2020) have focused on single-entity questions relation following operation. In paper, we propose model explicitly handles multiple-entity by implementing new intersection operation, which identifies the shared elements between two sets entities. We find...

10.48550/arxiv.2109.05808 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Recently, end-to-end (E2E) trained models for question answering over knowledge graphs (KGQA) have delivered promising results using only a weakly supervised dataset. However, these are and evaluated in setting where hand-annotated entities supplied to the model, leaving important non-trivial task of entity resolution (ER) outside scope E2E learning. In this work, we extend boundaries learning KGQA include training an ER component. Our model needs text answer train, delivers stand-alone QA...

10.48550/arxiv.2109.05817 preprint EN other-oa arXiv (Cornell University) 2021-01-01
Coming Soon ...