Ashish Shenoy

ORCID: 0000-0003-1401-262X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Speech and dialogue systems
  • Speech Recognition and Synthesis
  • Text Readability and Simplification
  • Caching and Content Delivery
  • Machine Learning in Healthcare
  • Multimodal Machine Learning Applications
  • QR Code Applications and Technologies
  • IPv6, Mobility, Handover, Networks, Security
  • Handwritten Text Recognition Techniques

META Health
2024

Amazon (United States)
2021

We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities. At core of Lumos is a Scene Text Recognition (STR) component that extracts from person point-of-view images, output which used to augment input Multimodal Large Language Model (MM-LLM). While building we encountered numerous challenges related STR quality, overall latency, and model inference. In this paper, delve into those challenges, discuss architecture, design choices,...

10.48550/arxiv.2402.08017 preprint EN arXiv (Cornell University) 2024-02-12

Automatic Speech Recognition (ASR) robustness toward slot entities are critical in e-commerce voice assistants that involve monetary transactions and purchases. Along with effective domain adaptation, it is intuitive cross utterance contextual cues play an important role disambiguating specific content words from speech. In this paper, we investigate various techniques to improve contextualization, word adaptation of a Transformer-XL neural language model (NLM) rescore ASR N-best hypotheses....

10.18653/v1/2021.ecnlp-1.3 preprint EN cc-by 2021-01-01

Neural Language Models (NLM), when trained and evaluated with context spanning multiple utterances, have been shown to consistently outperform both conventional n-gram language models NLMs that use limited context. In this paper, we investigate various techniques incorporate turn based history into recurrent (LSTM) Transformer-XL NLMs. For NLMs, explore carry over mechanism feature augmentation, where other forms of contextual information such as bot response system dialogue acts classified...

10.21437/interspeech.2021-1849 article EN Interspeech 2022 2021-08-27

In recent years, Federated Learning (FL) has shown significant advancements in its ability to perform various natural language processing (NLP) tasks. This work focuses on applying personalized FL for on-device modeling. Due limitations of memory and latency, these models cannot support the complexity sub-word tokenization or beam search decoding, resulting decision deploy a closed-vocabulary model. However, are unable handle out-of-vocabulary (OOV) words belonging specific users. To address...

10.1609/aaaiss.v3i1.31224 article EN Proceedings of the AAAI Symposium Series 2024-05-20

We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities. At core of Lumos is a Scene Text Recognition (STR) component that extracts from person point-of-view images, output which used to augment input Multimodal Large Language Model (MM-LLM). While building we encountered numerous challenges related STR quality, overall latency, and model inference. In this paper, delve into those challenges, discuss architecture, design choices,...

10.1145/3637528.3671633 article EN cc-by Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2024-08-24

QR codes have become ubiquitous in daily life, enabling rapid information exchange. With the increasing adoption of smart wearable devices, there is a need for efficient, and friction-less code reading capabilities from Egocentric point-of-views. However, adapting existing phone-based readers to egocentric images poses significant challenges. Code bring unique challenges such as wide field-of-view, distortion lack visual feedback compared phones where users can adjust position framing....

10.48550/arxiv.2410.05497 preprint EN arXiv (Cornell University) 2024-10-07

In recent years, Federated Learning (FL) has shown significant advancements in its ability to perform various natural language processing (NLP) tasks. This work focuses on applying personalized FL for on-device modeling. Due limitations of memory and latency, these models cannot support the complexity sub-word tokenization or beam search decoding, resulting decision deploy a closed-vocabulary model. However, are unable handle out-of-vocabulary (OOV) words belonging specific users. To address...

10.48550/arxiv.2305.03584 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Current research in zero-shot translation is plagued by several issues such as high compute requirements, increased training time and off target translations. Proposed remedies often come at the cost of additional data or requirements. Pivot based neural machine preferred over a single-encoder model for most settings despite evaluation time. In this work, we overcome shortcomings taking advantage transliteration linguistic similarity. We build single encoder-decoder system...

10.48550/arxiv.2308.05574 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...